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Results file low344-f igl -pi r„ res madr-3 by maryh on Wed 17 Apr 91 1 1 = 1 9 26-PDT. 

Query sequence being compared = LGW344-FIG1. PEP 

Number of sequences searched 8 17731 
Number of scores above cuto-f- f = 382G 

Results of the initial comparison of LQW344— FIG1. PEP with! 
Data bank * PIR 25.0, all entries 
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-CTGGCCTT-GCTCGAGGC- -EftTa-:ftCUTGGCQCTGTGCCACAeSCC 

550 560 570 580 



CGGCCTTGCGCCAGG-GTTGG 

590 600 



61 0 E20 630 640 650 660 
TGTTftAAGGCGGCCGCCCGGGCCftGCTTGGCCACCGGTGTTCCGGTAACCACTCACACGGCAGCAAGT 

l t i l i t i t ti i i i t i i i i i i i it i i i it ti t it i 

t i i i i i t i it iti t it i ti i iiii t i it ti t if i 

GCGCGGATCAAGGCG — CGGTCGGCCGAGG"! CGGGCTGGCGCGTGCGCCCTATTA-TCCCAGCGTCGC-GTT 
610 620 630 640 650 660 670 



670 680 SSO 700 710 720 730 

CAGCGC — GATGGTGAGCGAGGCAG— GCCGC—CATTTTTGAGTCCGAAGCTTGAGCCCTCACGGGTTTG— TA 

t i i i t i it ill ii i t t it i tilt lilt ti ill itit 

i i i i i i ii lit ii i it it i iiit i t t i ti iii iiii 

GAGCGCCGGGCGTTCGGCGCAGCGGCGCAGCACGGGGTTGGG — CGAGGACGGGGTTCGCAACAATGTGATG 
680 630 700 710 720 730 740 



740 750 760 770 780 790 

TTGGTCAC AGCGATGATACTG— ACGATTTGAGCTATC — TC-ACCGCCCTGCTGCGCGG-ATA-CCTCA 

iii it it i tit iti ( i ii i it iiiiiiiii i iiit i i i it 

iii ti ii i iii iti i t it i ii iiiiiiiii i tilt i i i ii 

GCGGTAACGCTGGCCTGGCGCCTGTTCGACTCGGGCGCCCGTTCGGCCGCCCTGC— GGGCGGCACAGGCGCA 
750 760 770 780 790 800 810 



800 810 820 830 840 850 860 

TC-GGTCTAGACCACATCCCGCACAGTGCGATTGGT— CTAGAAGATAATGCGAGTGCATCACCGCTCCTGGG 

i t i t i > t l l t i i • t lit tit ii t ii it ii ttt i iii t i 

i ii i ii it i t it t iti tit it t ii it it tit t lit it 

ACTGGACGAGGCCGC— GCAGGCCTATGGCG — CGGTGCTGCAGGACAA-GC— TG— GCCGAAGTGGTC GG 

820 830 840 850 860 870 



870 880 890 900 910 920 930 

CATCCGTT— CGTGGCAAACACGGGCTCTCTTGATCAAGGCGCT— CATCGACCAAGGCTACATGAAAC — AAA 

■ i it ii iii i : i : i : : t iitititiii ti tit itiii ti 

i I It II III i i t ! i t I I I i | | I I i I I I 11 III i t t t t 11 

CGCCTATTACGAGGCGGCCAC— GGCGC GGC AGGCGCTGCAT — ACTGCGG— TGGAAGACACGGAGA 

880 S90 900 910 920 930 



940 950 960 970 980 990 1000 

TC-CTCG — TTTC-GftATG^CTGGr.TGTrCG-GGTTTTCGAGCTATG — TCACCAACATCATGGACGTGATG 

i i t i i i t t i t tilt ti t it ti i i t tit t ti lit ii 

i t i t i iii t i itit ti t t t i i i i i iii i ti iti it 

TCGCCCGGCGTTCGGCCAGCnTCGCGGCGCGCCGCGCGCGCGC-AGGCCTGGACAGCCACGGCGATGTGCTG 
340 950 3SQ 370 330 930 1000 



1010 1020 1030 1040 1050 10GO 

GAT-CGCGTGAACCCC — GA-CGGGATGGCCT TCATTC-CACTGAGAGTGATCCCATTCTACGAGAGAA 

■ i i i i i iii it'll iiii it iii it i iii i tit i i 

ii<ii * iii ii it i iiii ii iii i< * iii i tit i i 

CATGCGCAGGCTGCCCTGGAGCGCGCGCGCCTGGCGCAGGCGCAGGCCGAAGGCGCGCAGGC— ACGCGCGCT 

1010 1020 1030 1040 1050 1060 1070 

1070 10S0 lOSO 1100 1110 1120 . 1 130 

GGGCGTCCC AC AGE A A ACGCTGCCi' VSiG.CftTC ACT GTG ACTA ACCCGGCGCGGTTCTGTGTC ACCG ACTTGCC 

i i t i i tit iii i i t i i t ti iiii tt tt i i » i i t i ti 

I I I I I III III T t f t t t || lilt II t t 1 I I I I t I II 

G6CCGGC CTGGCGC AGG TCCTGGGICGT CG ATCCGGC— CACGCCGATCG — TC— CTGGCGCCGGGT — CC 

1080 1090 1100 1110 1120 1130 1140 



1140 1150 1160 1170 1 ISO 1190 1200 

GTGCATGACGC— -CATCTGGfiTCC i'TCCAG-GCAGCGGCCACTATTCCCCGTCAAGATACCGAACGATGAAG 

i t i lit t i i i f t i i i i iii i i I ti t i it ii ti iti 

I ti ill : i I i i i t i i i lit i i I ft t i ii ii ti til 

G CTGGCGCCGCAAC-GC;ftTCe< v .PL-r- : r:-,G!7GAGCTGGCCCAAT — GGCTG.CGGGA CGCCCG — GCAG 

1150 1160 1170 1180 1190 1200 



1210 1220 1230 1240 1250 12GO 1270 

TCGCGCATCGATCGAT — AGGCATC'P'CA^TGTGATCAGG — GCTGCCACCTCCAAAGCCG — GTGGCCACC 

it iiii iiii i t » • i t i t t t t ii i i i t i i tt t i ii i t i t i 

It lilt till llltt • IT I t II It t I I I t I It t I II t t I I I 

-CG-TCATCCCGCGATCAAGGCAGC-GCAGGCKGGCCTGGCAGCGGCCACCGCCCAGGTCGATGTGGCGCGG 
1210 '.220 !.":30 1240 1250 1260 1270 



1280 1290 ;.3CO 1310 1320 X 

CCTGTCGATAGTnTTGi^J^'T/^^rSl'P'Ll'Gi-VrG^CCeTCiC — TTTTCGTGAACTGCAG 



o : 

ii i i t i i i i t i i i i i 

t i i i i i i i i t i i i i i 

SCORE O: ::25: 50 76 lOl 126 151 177 202 227 

STDEV -19 



PARAMETERS 



Similarity matrix 
Mismatch penalty 
Gap penalty 
Gap size penalty 
Cutoff score 
Randomization group 


Unitary 
i 

1- CO 

C 05 

s 
o 


K-tuple 2 
Joining penalty 20 
Window size 32 


Initial scores to save 20 
□pt i m i zed scores to save so 


Alignments to save 10 
Display context 10 




SEARCH STATISTICS 


Scores = 


Mean 


Median Standard Deviation 
7 2, 65 


T i mes = 


CPU 


Total Elapsed 



00 1 03 * 03. OS 



00*09= 14. 00 



Number of res i dues * 4333063 
Number of sequences searched" 17731 
Number of scores above cutof f s 332S 

Cut-off raised to 7„ 
Cut-off raised to S 0 
Cut-off raised to 3=, 

The scores below are sorted by initial score. 
Significance is calculated based on initial score, 



A 100% identical sequence to the query sequence was not found. 



The list of best scores iss 

In it. Opt. 

Sequence Name Description Length Score Score Sig- Frame 



S3 standard deviations above mean 



1. 


A282 1 4 


Phosphot ri esterase - Pseudomon 


325 


227 


251 


83. 28 


0 






4 standard deviations above mean 












2. 


VGBEGX 


Secreted glycoprotein 9X - Pse 


438 


19 


47 


4. 


30 


0 


3. 


SYBSYX 


Tyrosine — <;RMft iigase - Bacill 


419 


13 


43 


4. 


90 


0 


4. 


NOHUG 


Enolase gawma - Human #EC-numb 


434 


18 


62 


4. 


52 


O 


5. 


A27124 


!-;•» -trartzpnr t, \ no ATP^se - Leish 


374 


18 


68 


4. 


52 


0 


6. 


S02077 


Ei'oJes'? gaiTufH?. - Human <+ ragmen 


433 


18 


62 


4; 


52 


0 


7. 


A24742 


& 10le.se gamma chain - Rat »EC- 


434 


18 


63 


4. 


52 


0 


a 


A24405 


Ice nuclee.t j on protein - Pseud 


1200 - 


17 


63 


4. 


15 


0 


9. 


QDBP4L 


Hypotheticrl protein D-20S - b 


206 


17 


36 


4. 


15 


0 


io. 


A28852 


Hiiitong H3( 1 ) - Tetrahymena py 


135 


17 


27 


4. 


15 


O 


1 1. 


B24255 


Chnrictn cl '- h pi 0 i.e > 1 1 LJ.2 pf 


132 


17 


28 


4. 


15 


0 



■;rv?*7:- 3 standard deviations above mean 



12. 


S01921 


Hypcrt hn t i o u ,r..rotein 1 - Chi am 


451 


16 


19 


3. 77 


O 


13. 


SO 1022 


Hypothetical prutexn P-2 - Chi 


36 


16 


19 


3. 77 


0 


14. 


HSXL32 


Hi stone H3„ 2 - African clawed 


135 


16 


30 


3. 77 


0 


15. 


S03605 


Sur-face glycoprotein CD14 prec 


366 


16 


61 


3. 77 


0 


16. 


A24225 


Trcrnsducm bet& chain - Bovine 


340 


16 


40 


3. 77 


0 


17. 


A24853 


Trensducin beto chimin, li war - 


340 


16 


40 


3. 77 


0 


IS. 


A2606G 


Segmental i nn protein eve - Fru 


376 


16 


35 


3. 77 


0 


19. 


A25457 


Transducin beta chain - Bovine 


340 


16 


40 


3. 77 


0 


20. 


DAHUAL 


Arach i clonat e 5 - 1 i poxygenase - 


674 


16 


65 


3. 77 


O 



The scores below ere sorted by apt 1 in i zed score. 
Significance is calculated br?sed on optimized score. 

A 100% identical sequence to th« query sequence was not -found. 



The list o-f best scores is* 



Sequence Name 



Description 



Length 



Init. 
Score 



Opt* 
Score 



Sig. Frame 



1. 


A28214 


2. 


WMBEBH 


3. 


G0BEE3 


4. 


QOBEC3 


5. 


A25902 


6. 


S05506 


7. 


S00893 


8. 


S02389 


9. 


A27124 


io. 


WFHUM 


1 1. 


GNNY5P 


12. 


S04255 


13. 


GNWVWV 


14. 


Q9BE8 


15. 


B28894 


16. 


□KBOG 


17. 


DC2YPC 


18. 


S02386 


19. 


VGBEPB 


20. 


S00896 



xzzx 74 standard deviations above mean 
Fhosphotri esterase - Pseudomon 325 227 

5 standard deviations above mean 
72K protein - Bovine harpesvir 664 

4 standard deviations above mean 



HHLF1 protein - Cytomegalovi ru 788 

H9RF1 protein - Cytomegalovi ru 846 

65K protein antigen - Mycobact 588 

Phosphoenoi pyruvate carboxylas 966 

Adenylate cyclase precursor - 1706 

Cyclolysxn - Bordetella pert us 1706 

3 standard deviations above mean 



H+— transporting ATPase — Leish 
Mu ). 1 er \ an i nh i b i t i ng -factor pr 
Genome polyprotein 
Regu 1 etory prote i n 
Genome polyprotein 
Hypo that ica 1 BPLF1 
Mye 1 operox i dase H7 
Protein kinase^ cEMP-dependent 
Pyruvate decarboxylase - Zymom 
cyaB protein - Bordetella pert 
Glycoprotein gill precursor - 
Fer redox x n — n i t r i t.e reductase 



- Pol ioviru 
qa~lS - Neu 

- West Ni le 
protein - E 

- Human 



974 
560 
2207 
918 
3430 
3149 
830 
670 
559 
712 
479 
594 



251 74.70 



1 1 

9 
9 
15 
8 
IO 
10 

18 
13 
13 

8 
10 
10 
1 1 
1 1 

8 
IO 

9 

8 



72 

70 
69 
69 
69 
69 
69 

68 
68 
68 
68 
68 
67 
67 
67 
67 
67 
67 
67 



42 



64 
26 
26 
26 



4. 
4- 
4- 
4. 

4. 26 
4. 26 



3. 87 
3. 87 
3. 87 
3. 87 
3. 87 
3. 48 
3. 48 
3. 48 
3- 48 
3. 48 
3. 48 
3. 48 



0 



0 
O 
0 
0 
0 
0 

0 
0 
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o 
o 
o 
o 
o 
o 
o 
o 
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1. L0W344-FIGL 
A282 1 4 

ENTRY 
TITLE 

SOURCE 

ACCESSION 

REFERENCE 
^Authors 
n Journal 
ttTitle 

SUPERFAM I LY 

KEYWORDS 

COMMENT 

SUMMARY 



PEP 

Phosphodiesterase - Pseudomonas diminuta MG plasm 
A28214 SType Protein 

Phosphotri esterase ™ Pseudoraonas diminuta MG pi asm id 
pCMSl 

Pseudoffiunas d i m i nuts. 
A2R214 

(Sequence translated -from the DNA sequence) 
McDanie! C- ^ e <* Harper L 3 L, 9 Wild J. R. 
J- Sector i o i . < I 333 ) 1 70 = 2306-23 1 1 

Cloning **nd sequencing oi a pi asm id-borne gene <opd) 

encoding .> phoe-photri esterase, 
3*Nama p! *osphotr x esterase 

THIS SEttUEN^E hWS NOT BEEN COMPARED TO THE 

NUr.i F DT?Ti': TF'YiNSL rtTI0M B 
»Mr/j^.ulBr-'WC-.i«y ; -. ~S : J7U3 ©Length 325 ^Checksum 539 



StWUENCfc 



Initial Score 227 Optimized Score 251 Significance » 74.70 

Residue Identity = 76% Matches- « 260 Mismatches = 51 

Gaps ™ 28 Conservative Substitutions « 0 

X 10 20 30 40 50 GO 70 



M9TRRWLKSA A AGTLLGGL AGC A I" WLDRS AQ A IGSI RARP I T I SEAGFTLTHED I CGSS AGFLR AWPEFFG 
t t i i i t i t i i i i i i i t i i » » i i i • i i i i i t i i i i t i i i i i I i i i i » t i » i i i i ii 

I I I I I I I I t I I I I I I I I I » f 1 I I I 1 ! 1 t t I t I I I I I I I I I t I I t I I I I I t I I t II 

MQTRR WLKSftAARTL.LGGa.AGCA TWLDR3AQ AMRS I RARP I T I SEAGFTLTHED I SAARQDSCVLGQS 

X 10 20 30 40 50 60 

80 90 100 110 120 130 

SRKALAEKAVRGLR— ARAAGVR T I VD VSTFD I GRDVSLLAEVSRAADVH I VA ATGLWFDPPLSM 

I t ill ii ii ill I I t t I I I i i i i i i i i 

i i lit ii ii ill i i i I i t i I i i i t I i i 

SSVAOSSSGKGCER I ARQSGWRANHCRCVDFRYRSRRGF I GR GFAGCRR SYLAATGLWFDPPLSM 

70 SO 90 lOO 110 120 130 

140 ISO 160 170 ISO 190 200 

RLRY VEELT QFFLRE 1 9 YG I EDTG I RAG 1 1 K V ATTGK ATPF9ELVLK A AAR ASL ATG.V PVTTHTA AS 

I i i i i i > i t i i i t i i t i t i I t i i i t i i i i i i i i i i I i i i i t i i t i I i t t l i i I i i ( 

i » i i i i i i t i i i t » i i i i i t i i t i » i i i » » i i * t » i i i i » i i i t i » t t t i t » i t » i 

RLR Y VEELTL VLP A VRFNM ASK Y TG I PAG 1 1 KVATTGKATPFQELVLKAAARASLATGVPVTTHTAAS 

140 150 1G0 170 1 SO 190 200 

210 220 230 . 240 250 260 270 

ORDGERGRPPFLSPKLEPSRVC I GHSDDTDDLS YLTALLRGYLI GLDH I PHSA I GLEDNASASPLLG I RSWQ 

i I I i f i I i i i i i i I I i ■ I i i i I i t ■ i i i < i i t t I I i t I i I i i I * I i i ( t i i i I I i I I I t i i i i I ■ i i i I i i i 
• i i i t i i i i i i i i i i i ( i i i i i i i t t i t ; t t i i i i i ti i i i i i i t i • < t • i i i i i i i i i i i i i i i i t f i i i i 

ORDGERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLRGYL I GLDH I PHSA I GLEDNASASPLLG I RSWQ 
210 220 230 240 2SO 260 270 

280 290 300 310 320 X 330 

TRALL I K AL I DOGYMKG I LVSNDWLFGFSSYVTN I MDVMDR VNPDGMAF I PLRV I PFYERR 

i t i i > i > i > i > i i i i i i i i i t t i t ' r : t > t : : t > t t i t t t i i i i i i i i i i 
t i i i • i i i i t i • t i t i t > i t i t i t ) t f i ' i r t t i i t t t t t i ■ i r i i > * i t 

TRALL I K AL I DOGYMKG I L VSNDWi J=GFSSY VTN I MDVMDRVNPDGMAF I H 

280 290 300 310 320 X 



2. L0W344-F I G 1 . PEP 

WMBEBH 72K protein - 9ov .. -r^. herpesvirus (type 2. strain 



ENTRY 
TITLE 

DATE 

PLACEMENT 
SOURCE 
ACCESSION 
REFERENCE 
©Authors 

© Journal 
©Title 



©Comment 
COMMENT 

COMMENT 
SUPERFAM I L Y 
SUMMARY 
SEQUENCE 



©Type Protein 
an - Bovine herpesvirus 



(type 2? strain 



WMBEBH 
72K prot€ 
BMV) 

31 -Mar- iSSO ©Sequence 31 -Mar- 1990 ©Text 31 -Mar- 1990 

1386, O J. O 2, O 1.0 1-0 

bovine tnatni 1 1 it is virus* bovine herpesvirus 2 

B29242 

(Sequence translated -from the DNA sequence) 
Hammerschmidt W. * Conraths F. * Mankertz J. , Pauli 

G. ? Ludwifj K » Buhk H. 3. 
Virol ogy ( 1 SSQ > 1 65 * 388-405 
Conservation of a gene cluster including 

glycoprotein B in bovine herpesvirus type 2 

(BHV-2) and herpes simplex virus type 1 (HSV-1). 
The amino acid sequence is not given in this paper. 
The DNA sequence was obtained -from GenBank » release 

61. 0. 

This virus is a member of the family Herpesvi r idae. 
©Name herpesvirus infected cell protein ICP18. 5 
©Molecular-weight 7P3S7 ©Length 664 ©Checksum 8190 



Initial 
Residue 
Gaps 



Score 
Identity 



11 Optimised Score = 72 
2B% M^tcNes ~ 101 

83 Conservative Subst x tut ions 



S i gn i f i cance 
Mismatches 



5. 42 
203 
O 



X 10 20 30 40 50 

MOTPRWLK SflftAta f! -L- -J3SL — AGCA7V/LDRS A0A I GSIRARP I T I SEA-GFTLTHED I 

t It! I t t I I | I III | 

* It* t I I I | | t III t 

RLAGK I CDHVT00ARVRL DftLlEMRS -INLPI WVGLSEARRARfiLHALEVSSKMTEANSGGFAEAPGPAAAQE — 
230 X 240 '111 )0 2fiO 270 280 290 



GO 70 80 90 100 110 120 
CGSSAGFLRAWPEFFGSRK AL AEK AVRGLR-ftRAAGVRT I VDVSTFD I GRD-VSLL AEVSR AADVH I V 

i i i i i ii ii lit ill i lilt I i 

lilt t t tttt lit lit i tilt i i 

REASA-LLDAHHV'FKSAPF'GL — YAVSE-"! .RFWLSSGDRT — SGSTVDAFADNLSALAERERRYETGAVAVEL 
300 310 320 330 340 350 360 



130 S.40 150 160 170 180 
AATG LWFDPPLSMR ! .RYVEELT QFFLRE I QYG I EDTGI RAG 1 1 K VATTGK ATPFQELVL 

lit it i i i i lit i ii 

ill i i tttt t i t i i t i 

AAFGRRGEHFDRTFGDRVAS! ..DMV) 'AL FVGG9SAAPDDQ I EALVRACYNHHLSAP VLRQLAGSE 

370 3S0 390 400 410 420 



ISO 200 21 0 220 230 240 

— KAAARASLATBVPVTTHTA— ASU'iRDG ERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLRGYL I 

■ lit t i ii i til tilt it 

tti t it t i till til i ■ t 

HGDAEALRSALEG LMAAEDPFGDGNAEKEARRAPSL GGGPEDDWAALAARAAADVGARR 

430 440 450 460 470 480 



250 2G0 270 280 290 300 310 

GLDH I PHSA I GLEDNAS ASPLLG I r iSWOTRALL I KALI DQG Y MKQ I LVSN — DWLFGFSSYVTN I MD 

i it lilt i ill i ii i 

t it till i til t it i 

RL YADRLTKRSLAS — LGRHVF.EQF^GELEKMLRVSTYGEVLPTVFAAVCNGFAARTRFCELTARAGT 

490 500 510 520 530 540 



320 330 340 X 

VMD-R VNPDGM AF I PLR V I PFYERRAS— HRKRCGASL 

i i t tit i : i i t : i • 

t I t I I I t I 111 1 ! t 

V I DNRGNPD — TFDTHR FMfi ASLMRHRVDPftLLPG I THGFFE 

550 560 570 580 



3. L0W344-F I OS i . PEP 

QQBEE3 HHLFl orate in 



Cytomaqalovirus (strain AD 169) 



ENTRY 
TITLE 
DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 
^Authors 
# Journal 
#Ti tie 



COMMENT 
COMMENT 
GENETIC 

#Name 
SUPERFAM I L Y 
SUMMARY 
SEQUENCE 



Q9BEE3 SType Protein 

HHLFl protein - Cytomegalovirus (strain AD 169) 

30-Ssp- > 9SC ^Sequence 30-Sep-1989 &Text 31 -Dec- 1989 

1353.0 l.O 1.0 2.0 1.0 

human cytomegalovirus! human herpesvirus 5 

C27349 

(Sequence translated -From the DNA sequence) • 

Weston K, * Q&rrol 1 EL G 0 

J. Mo 1 . Biol. ( i 326 > 1 32 » 1 77-208 

Sequence of the chart unique region? short repeats? 

and p^.rt <jf th<? long repeats of human 

cyt omega 1 r,\t i rus. 
The DMA sequence was obtained from EMBL, release 13. 
This virus is a member of the family Herpes v i r i dae. 



HHLF 1 

#Name cy torn- .-ge I r >v i n 4S 
«Mo 1 ecu 1 ar -v;e i yh t 8398 1 



HQRFJ. protein 
ttLength 788 ^Checksum 7858 



Initial 
Residue 
Gaps 



Score - 
Identity « 



9 Optimized Score — 70 
23% Mi etches « 90 

5''5 Osnser vat i ve Substitutions 



S i gn i f i cance = 
Mismatches =» 



4. 64 
241 
0 



X 10 20 30 
MQ fPRVVLKSr iP. nO J t uLf-^LAGOm/LDRSfl-EAIG 



40 SO 
SI R ARP I T I SE AGFTLTHED 



ASAPHPASLLTAVRRHI -H&RL CCGA 'LA 1 . .LsftV : L FARWLGiCAAGPATGT AAGTTSPPAASGTETEAAGGDAPCA 
330 X 340 350 360 370 380 390 



60 70 R0 SO lOO 110 120 

I CG — SSAGFLRAWPEFFGS:<i\A L/-Y- 1 r . AVRGLR ARA AGVRT I VDVSTFD I GRDVSLL AEVSRAADVH 

it ii i i t t i I t t i < ll i i iiii 

ii it i i i i i i t t i i ii i t iiii 

I AGAVGSAVPVPPGPY'iAAGGGA I CVPNADAHAWGADAAAAAAPTVMVGSTAMAGPAAS — GTVPRAMLW 
400 410 420 430 440 450 4G0 

130 140 150 160 170 180 
I VAATGLWF— DPF'LSMRLRYVEELTQFFLRE I OYG I EDTG- I RAG I I K VATTGK ATPFQE LVLKA 

IIII III t I I I I I 

iiii i t i i i i i I i 

LLDELGAVFGYCPLDB-IVYPLAAELSHFLRAGVLGALALGRESAPAAEAARRLLPELDREQWERPRWDALHL 
470 480 490 500 510 520 530 

190 200 >110 220 230 240 250 

A ARASLATGVPVTTHT AAS9RDGER6RPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLR — GYL I GLD — H I 

lit i l i i t i i i it ii ill 

ill i I : i t i t i it it ill 

HPRAALWAREP— HGQLAFLL1?PG—RGEAEVLTLATKHPAICANVEDYLQD ARRRADA.QALGLDLATV 

540 550 560 570 580 590 600 

260 270 280 290 300 310 

PHSA I G LEDNASASPI .LG I RSWC3TRALL I KAL I DQGYMKQ I LVSNDWLFGFSSYVTN I MDVMD 

I I ! I t It t 1 II I I 

II I I I I I It II I I 

VMEAGG0MIHKKTKKPKGKEOESLMKGKHSRYTR— PTEPPLTPQASLGRALRRDDEDWKPS RLPGED 

63.0 620 630 640 650 660 

320 330 340 X 

RVNPDGMAF I PL R V I PF YERR- ASHRKRC9ASL 

t i i iii it 

it t tit it 

SWYDLDETFWVLGSNRKNDVYQRRWKKTVLRCGLEIDRPMPTVPKG 
G70 680 630 700 X 710 



4. L0W344-FIG1. PEP 

BQBEC3 H9RF1 protein 



Cytomegalovirus (strain AD 169) 



ENTRY 
TITLE 
DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 
^Authors 
# Journal 
#Tit le 



COMMENT 
COMMENT 
GENETIC 

ttName 
SUPERF AM I L Y 
SUMMARY 
SEQUENCE 



GQBEC3 &Type Protein 

HSRF1 protein ~ Cytomegalovirus (strain AD1S3) 

30-Sep~I839 ^Sequence 30~-Sep-1989 #Text 30-Sep~1989 

1 358, O 1. O 1.0 1, O 1.0 

human cytomagalovi rus 9 human herpesvirus 5 

C2607S 

(Sequence translated -from the DNA sequence) 

Weston K n * Barrel 1 B„ G. 

J„ MoL Biol- (19SS) 192*177-208 

Sequence o-F the short unique region* short repeats? 

and part of the long repeats of human 

cy tomega 1 ov i rus. 
The DNA sequence was obtained from EMBL? release 13. 
This virus is a member of the family Herpesviridae. 



H9RF1 

#Name cytomegalovirus HQRFi protein 
ttMolecular-weight 91047 ^Length 846 ^Checksum 



2604 



Initial 
Residue 
Gaps 



Score = 
Identity = 



9 Optimized Score « 69 
22% Matches « 86 

46 Conservative Substitutions 



S i gn i f i cance 
Mismatches 



4. 26 
248 
0 



X 10 20 30 

MQTRRV VLKSAAAGTLLGGLAGCATWLDRSA-QA I G- 



40 50 
-S I RARP I T I SEAGFTLTHED 



ASAPHPASLLTAVRRHLNQRLCCGWLALGAVLPARWLGCAAGPATGTAAGTTSPPAASGTETEAAGGDAPCA 



330 X 340 



350 



360 370 3S0 390 



60 70 80 90 100 110 120 

I CG — SSAGFLRAWPEFFGSRK A LA-EK AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAEVSRAADVH 

ii ii i i i t i i i i i i ti i i iiit 

ii ii i i i ii i i ii ■ ii i i iii i 

I AGAVGSAVPVPPQPYGAAGGGA I CVPNADAHA WGADAAAAAAPTVMVGSTAMAGPAAS — GTVPRAMLW 
400 410 420 430 440 450 460 

130 140 150 160 170 ISO 
I VAATGLWF-DPPLSMRLR YVEELT9FFLRE I OYG I EDTG- I RAG I I K VATTGK ATPFBE LVLKA 

i t i i iii i i i i i i 

i i i f tit i i i i i i 

LLDELGAVFGYCPLDGHVYPLAAELSHFLRAGVLGALALGRESAPAAEAARRLLPELDREQWERPRWDALHL 
470 480 490 500 510 520 530 

190 200 21 0 220 230 240 250 

AAR ASLATG VPVTTHT AASGRDGE -RGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLRG YL I GLDH I PHS 

ill ii l ii i i it i it t 

■ it it i it i i t i t tit 

HPRAALWAREP HG9WEFMFREORGDP I NDPLAFRLSDARTLGLDLTT VMTER0S9LPEK Y I GFY9 IRKP 

540 550 5GO 570 580 590 600 

260 270 280 290 300 310 
A I GLEDNASASPLLG I RSWQT RALL. 1 1< AL I DOGYMKQ I LVSNDWLFGFSSYV TN I MDVMDRVNP 

i iiit ii ii ii ii 

■ i i i i ii it ii ii 

PWLME 9PPPPSR9TKPDAATMPPFLSAQASVSYALRYDDESWRPLSTVDDHKAWLDLDESHWVLG 

610 620 630 640 650 660 

320 330 340 X 

D-GMAF I PLR-V I PFYERRASHRKRCQASL 

i ii it i 

i ii it i 

DSRPDD I KQRRLLK ATQRRGAE I DRPMP WPEEC YDORFT 
670 S80 6S0 700 



5. LO W344-F I G 1 . PEP 

A25S02 £35 K prote :.r. antigen - Mycobacterium leprae 



ENTRY 

TITLE 

SOURCE 

ACCESSION 

REFERENCE 
^Authors 
ft Journal 
#Title 

SUMMARY 

SEQUENCE 



A25C302 #Typ3 Protein 

B5K protein antigen - Mycobacter i urci leprae 

Mycobscter i urn 1 sprees 
A25902 

(Sequence translated trowi the DNA sequence) 
Mshra V, , Sw^e^ser D. * Young R. A, 
Proc- Nat. Acad, Sci, USA ( 1SSS) 83 = 7013-7017 
E-f-ficient napping o-f protein antigenic determinants. 
#Molecular-Vv'eight. G1855 ^Length 588 ^Checksum 3048 



Initial Score = 
Residue Identity - 
Gaps = 



1'5 G^t\^\zso Score -= 69 Significance = 4.26 

23% M-vtclius - 30 Mismatches = 231 

58 Or/nserwa^-ive Substitutions = O 



X 10 20 30 40 50 
MQTRRVVLKSAA P.tf i LUV.3L AGCATWLD7SSA9A I GS I R ARP I T I SEAGFTLTHED 



VPGRDGETQPASCGRPSRALHPfiriVSr .^C?<rPVT~ 
10 20 30 



-LASFL I RRNHFAM AK T I A YDEEARRGL — ER 
40 50 60 



60 70 BO 90 100 110 120 

I CGSSAGFLRAWPEFFGSRKA' AEKAVPGL RP.RAAGVRT I VDVSTFD 1 GRDVSLLAEVSRAADVH I VA 

• • t i tt ii i ii tiii ti 

» t It 1 ! I| 1 II I II I It 

GLNSLADAVK VTLG!-'KG-F:i\!V vL E: ; . - w.^PT I TNDGVSI AKE I ELEDPYEK I G — AELVKEV — AKKTDDV A 
70 SO SO 100 110 120 130 



130 140 > SO 160 170 180 190 

ATGLWFDPPLSMRLR Y VEEL r©FR...PE ' HY G 1 1 -DTGi 1 PAG I 1 K VATT-GK ATPFGELVLKAAARASLA 



GDGTTTATVLAOi-,'.---- VK.E . r- . Ii -W^li .KR.2. ; EKAVDKVTETLLKDAKEVETKES I AATAA I S 

140 ).zj<. t.SO 170 180 ISO 



200 21 <; 520 230 240 250 

TGVPVTTHTAASORD- -G:ERb: :PFr LSPKL —EFSRVO I G — HSDDTDDLSYLT ALLR6YL I GLDH 

1 tit; I til t I t I | i 

I I I ! t t I i 1 I ( || t I 

AGD9S I GDL I AEAMDK VENELW I TVEESN I FGL&LEL lEGMRFDKGY I SGYFVTDAEROEAVLEEPY I LLVS 
200 210 230 230 240 250 260 

260 270 230 230 300 310 320 

I PHSA I GLEDNASASPLLG X PSWQTPAU. i KP.L. I DQGYMKQ I LVSNDWLFGFSSYVTN I MDVMDR — VNPDG 

i ; t i ;ii t lit ii ii 

i tit tiii i til t t it 

SKVS TVKDLLPI _LEKVI'2ARi<3LLI lAEDVEGEALSTLVVNKIRGTFKSVAVKAPGFGDRRKAMLQD 

270 280 2B0 300 310 320 330 

330 340 X 

MAF I PLRV I PFYERRASHRKRCQAfSL 

lit i t t 

> t t t t t 

MA- 1 LTGAQV I SEEVGLTLEI -4T DL3LLC iKARKWMT 
340 350 3B0 370 



6. L0W344-FIG1. PEP 

S0550S Phacphoerio!pyru\/ai;o carboxylase 1 - Common ice pla 

ENTRY S0550S STypf* Protein 

TITLE Phosphaenol pyruvate carboxylase 1 - Common ice plant 

©EC-number 4„ 1 . 1.31 
SOURCE Mesembryanthe^urn crystal 1 inum ©Common-name common 

ice plant 
ACCESSION S0550S 
REFERENCE 

©Authors Cushman J, C ■ Bohr^rt H, J. 

© Journal Nucleic Acids (1989) 17=6745 

©Title Nucleotide sequence of the gene encoding a CAM 

specific isoform of phosphoenol pyruvate 
carboxylase -From Mesernbryanthemum crystal 1 inum. 

©Reference-number S05506 

©Acces s i on S0550G 

©Molecule- type DNA 

©Residues 1-9S6 <CUS> 
REFERENCE 

©Authors Pickers J. 9 Cushman J. C. » Michalowski C. B. > Schmitt 

J- M„ , Bohnert H, J, 
©Journal Mol. Gen, Genet, (1989) 215 = 447-454 

©Title Expression of the CAM-form of phospho< enol ) pyruvate 

carboxyJaoe and nucleotide sequence of a full 
length cDNA from Mesembryanthemum crystal 1 inum. 
©Re f er ence-numbe r S027 1 6 
© Acces s i on S027 1 S 
©Molecule-type mRNA 
©Residues 1-3^6 <T;IC) 
SUMMARY ©Molecular- -weight 110S59 ©Length 966 ©Checksum 725 

SEQUENCE 

Initial Score s O^t united Score = 69 Significance « 4.26 

Residue Identity » 23% Matches » 94 Mismatches = 212 

Gaps « SO Conservative Substitutions = 0 

X 10 20 30 40 50 

MQTPRVVLKSA^AGl '! ± ZPL . AGiC AT WLDRSA9 A 1 GS I R ARP I T I SEAGFTLTHED I 

i i tiii it i ii iti 

i i tt tt it t it itt 

SVRRSLLQKHGR I FDD. . At?L .YftK D x TPDDi X'ELDEALQRE I QAAFRTDE I RRTQPTPQDEMRAGMS YFHET I 

180 190 £00 £10 220 230 240 

60 70 20 90 1 OO HO 



f COBSftGFLRAWPEFFuSRk^LAEKrtVRuLRAR AAGVRT I VDVSTFD I GRDVSLLAEVSR 

III I I I ! II t II* III 

WNQVPKFLR RLE) TALK N I 0« I TER VPYNAPL I GFSSWMGGDRDGNPRVTPEVTRDVCLLA-RMM 

250 2G0 270 280 290 300 310 

120 130 140 150 160 170 

AADVH I VAATGLWFDPPLSM PLR-YVEELTGFFLRE I ©YG IEDTGI RAG I I K VATTGK ATPFQE — L 

ii li tit ii lit i ii iii 

ii 11111 it t t i i t t iii 

AANMYFS9 I DELMF — ELSMWRCTDELRER AEELHK YSKRDSKH Y I E FWKGIPSSEPYR 

320 330 340 350 3GO 

180 190 200 210 220 230 

VLKA AARASLATGV P V— TTHTAASQRDGERGRPPFLSP-KLEPSRVC I GHSDDTDDLS 

ii i it i tt tt t lit i i ii 

t i i ii t ft ii i i t i i i it 

V I L ADVRDKL Y YTRERSRBLL ASEV3E I PV EATFTE I D9 FLEPLELCYRSLCACGDRPVADGS 

370 380 390 400 410 420 430 

240 250 260 270 280 290 300 

YL TALLRGYL I GLDH I PHSA I GLEDNAS ASPLLG I RSWQTRALL I KAL I DQGYMKQ I LVSNDWLFGF 

1 I III | III! I I I lit 

I I III I I I I I I t I lit 

LLDFMROVATFGLCLVKLD I RGESERHTQVHDA I TTHLG I GS — YRDWTEEKRQD — WLLSELRGKRPLFGP 
440 450 460 470 480 490 

310 320 330 340 X 

SSYVT— N I MD VMDR VNPDGMAF I PLR V 1 PF Y-ERRAGHRKRCQASL 

i i i i i i it i t i 

i i t i i i ii t i i 

DLPRTDE I ADVLDT I N — V I AELPSDSFG A VV I SMAT APSDVLAVELLQRECK VKK 
500 510 520 530 540 X 550 



7. L0W344-FIG1. PEP 

S00393 Adenylate cyclase precursor - Bordetella pertussis 

ENTRY S00S93 ttTyps Protein 

TITLE Adenylate cyclase precursor - Bordetella pertussis 

ttEC-n-jmbev* 4, k., 1. 1 
INCLUDES probable haemolysin 

SOURCE Bordete 1 J a pertuss i s 

ACCESS I ON S00S93 
REFERENCE 

^Authors Glaser P. » Ladant D. i Sa?er i Pichot F. , Ullmann 

A- i Danchxn A, 
Journa 1 Mo 1 . Mi r rob 1 o 1 . ( i 333 ) 2=1 3-30 

#Title The caliTiodul in- sensitive adenylate cyclase of 

Bordetella pertussis a cloning and expression in 
Escherichia coii. 
ttRe-ference-numnber S00893 
ttAccess i on S0OG93 
#Mo 1 ecu le- type DNA 
^Residues 1-1706 <GLA> 
ftCross-reference EMBL»Y00545 
GENETIC 

ttNaime cya 
SUMMARY ttMolecular-weiglvt 17750B #Length 1706 ^Checksum 6461 

SEQUENCE 

Initial Score = lO Optirnxz^ed Score « 69 Significance » 4.26 

Residue Identity = 23% M:\tcheo 93 Mismatches = 206 

Gaps , - 37 Conservative Substitutions « , o 

X 10 20 30 40 

MQTRR WLKS A A AGTLL6GL AGC ATWLDRS AQ A I GS I R ARP I T I — -SEA 

t it t i j * i iii i i i 

■ it i t t i i tii i i i 

LMTOFGRAGSTNTPQE^IASLS AAVKGL- GEASSAVAE1 VSGFFRGSSRWAGGFGVAGGAMALGGG I AAAVGA 

490 500 :"»j.0 520 530 540 550 



3tT"" " 60 7i i SO 30 1 00 110 

GFTLTHEDICG — SSAGFLRAWPEFFR SRK ALAEK AVRGLR AR AAGVRT I VDVSTFD I GRD VSLL AE 

ill i t i i i tiii till i i 

lit I it : i tiii till i i 

GMSLTDDAPAGQK AAAGAE I AL QLTfSlVf Vfr I .ASS I Al .A LAAARGVTSGLQVAG ASAG 

560 570 5 JO 590 GOO 

120 130 140 150 160 170 

VSRA ADVH I VAATGLWFDPPLSMR1 .RYVEB'l TOFFLRE I QYG I E DTGIRAGI IKVATT 

ti ii i t i iti ( i i t i 

ii ii i i i lit t i i i i 

AAAGALAAAI-SPMEIYGLVQQiJHvVu30LDKLAQES3AYGYEGDALLAQLYRDKTAAEGAVAGVSAVLST 

610 620 630 640 650 660 670 

180 190 200 210 220 230 

GKATPFQELVLKAAARASLATGVPVTTH T AASQRDG ERGRPPFLSPKLEPSRVC I GHSDDTDDLSYL 

i i iii it lit i i i t t it ii 

i i t t i i t i i * it t t t ii It 

VGA A VS I AAA— AS-WSAPVAWT — SLLTGALNG I LRGVQGP 1 1 EKL AND Y ARK I DELGGP 

680 690 700 710 720 730 

240 250 260 270 280 290 300 

T A LLRGYL I GLDH I. PHSf U GL EDNAS ASPLLG I RSWQTRALL I K AL I DQGYMKQ I,L VSNDWLFGF 

I t t i it i it ii ill 

i ti I ti i ii ii lit 

QAYFEKNLOARHEQLANSDGI .RKML ADLOAGWNASSV I G VOTTE I SKSAL ELAA I TGNADNL — K 

740 750 760 770 780 790 

310 320 330 340 X 

SSYVTN I MDVMDR VNPDGM AF IP — LRVI PF YERR ASHRKRCQ ASL 

it i i i t I i it it i 

ti i t lit: itit i 

SVDV F VDRF V9GERV AG8P WLDV A AGG I D I AS-RKGERP ALTF I TPLAAPG 

800 810 820 830 840 



8. L0W344-FIG1. PEP 

S02389 Cyclolysin - 13c* 



dste I 1 a pert uss i s 



ENTRY 
TITLE 
INCLUDES 
SOURCE 
ACCESSION 
REFERENCE 
^Authors 



S02389 ttTypts Protein 

Cyclolysin - Bcrdetella pertussis 

adenylate cyclase #EC-number 4.6. 1. 1\ hemolysin 

Bordete i 1 a pert uss i s 
S02383 



Sakamoto H. 



Bellalou J. 



U 1 1 mann A. 



Glaser P. 
Danchin A. 

# Journal EMBO J- ( 198S) 7s 3997-4004 

ttTitle Secretion of cyclolysin, the calmodul in-sensitive 

adenylate cyclsse-haeroolysin bi functional protein 
of Bordetelir.\ pertussis. 

#Ref erence-nuirnber £02386 

#Acces s i on S02389 

#Molecule-type DNA 



^Residues 
GENETIC 

ttName 
COMMENT 

FEATURE 

1-312 

313-1706 
SUMMARY 
SEQUENCE 



1-1706 < GLA > 
cyaA 

THIS SEQUENCE HAS NOT BEEN 
NUCLEOTIDE TRANSLATION. 



COMPARED TO THE 



troovrimn adenylate cyclase <ADE>\ 
^Domain hemolysin < HEM > 
^Molecular-weight J. 77476 ttLength 1706 ^Checksum 



6271 



Initial Score 
Residue Identity 
Gaps 



10 Uptivnized Score 69 
23% Matches « 93 

97 Conbt m t i ve Subst i tut i ons 



S i gn i f i cance 
Misimatches 



4. 26 

206 
0 



X 1 0 20 30 40 
MOTRRVVLK SAAAGTLL A&CATWLDRSAQA I GS I R ARP I T I SEA 



LMTQFGRAGSTNTP^EftftSL .S&fW; - SL- -SErtSf JAVAEl VSGFFRQSSRWAQGFGVAGGAMALQQG I AAAVGA 

490 500 510 520 530 540 550 



50 BO '?n SO 90 100 110 

GFTLTHEDICG — SSSAGFLRAWPE.-FG SRK ALAEK AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAE 

ill l it i (ill i i ii i i 

ill i , t i i i tiii i i i i > i 

GMSLTDDAPAGQK A A AG AE I ALQL l 'GS T VEL.ft'iS I ALA LAA ARG VTSGLQ VAG ASAG 

560 570 530 590 600 



120 130 140 ISO 1GO 170 

VSRAADVHI VAATGLWFDPPLSMRLRYVEEL1 GFFLRE I Q YG I E DTGIRAGI IKVATT 

ii it j i i ill i l i i i 

ii ii i i t tit i t i l i 

AAAGALAAALSPMEIYGLVQ&BHYAUSLDKLABESSAYGYEGDALLAQLYRDKTAAEGAVAGVSAVLST 

610 620 G30 G40 650 660 670 



ISO 190 I-'OO 21 0 220 230 

GKATPFQELVLKAAARASLAT6VPVTTHTAAS9RDG ERGRPPFLSPKLEPSRVC I GHSDDTDDLSYL 

i i iti t i tii t ; i ii ii ii 

i i iii ti tit it i ii ii ti 

VGA A VS I AAA— AS-WGAPVAWT — SLLTGALNG 1" LRGVQQP 1 1 EKL AMD YARK I DELGGP 

680 690 700 710 720 730 



240 250 260 270 280 290 300 

TA LLRGYL I GLDH I PHSA I GL EDNAPASPLLG I RSWQTRALL I K AL I DQGYMKQ I LVSNDWLFGF 

i ii i ii i it ii tii 

i ii i i i i i i ii lit 

9 AYFEKNLQARHEGLANSDGLRKMi . ADLO AGWN ASSV I G VOTTE I SKSAL EL A A I TGNADNL — K 

740 750 760 770 780 790 



310 320 330 340 X 

SSYVTN I MDVMDRVNPDGMAF I P— -LRV I FF YERRASHRKRCQASL 

ii i i : i t t ttit i 

ii i i tiii iiii i 

SVD V FVDRF VQGERV ARQP WLD V A AGS 3CDI AS-RKGERPALTF I TPLAAPG 

800 810 3^0 830 840 



9- L0W344-FIG1. PEF 

A27124 H+-transporting ATPase - Leishmania donovani 



ENTRY 
TITLE 



SOURCE 
ACCESSION 
REFERENCE 
^Authors 

# Journal 
ttT i 1 1 e 



A27I24 -STypfi Protein 

H+-transportmg ATPase - Leishmania donovani 
EEC-number 3, B, I, 35 
ALTERNATE-NAME proton-transport ing ATPase 

Le i s hman i a iJonovan i 
A27124 

(Sequence translated -From the DNA sequence) 
Meade X C. 9 Shaw J, 3 Leimaster S, * Gallagher G. * 

Stringer J. R, 
No 1 . Ce 1 1 ■ Biol. (1 937 ) 7 > 3937-3946 
Structure and expression o-f a tandem gene pair in 
Leishimema donovani that encodes a protein 
structurally homologous to eucaryotic 
cat i on-t r --3 nspcrt i rig ATPases- 
The authors translated the codon AGA -for residue 352 
as Lys* 

ttMolecular-weiyht 107476 ttLength 974 ^Checksum 834 



^Comment 



SUMMARY 
SEQUENCE 



Initial Score 
Residue Identity = 
Gaps » 



13 0,;-)tiHii;^sd Score 
24% Matches 



68 Significance 
93 Mismatches 



82 C; ms^rvat : ; , ve Subst i t ut i ons 



3. 87 
210 
0 



X lO 20 30 40 50 

MQTRS WLKa AA AGTLLGGLAGCATWLDRSA9A I GS I RARP I T I SEAGFTLTHE- 



FLDPPRPDTKDT I RRSKEYGVDVKM I TCiDHLLI AisEMC-PMLIlLDPNI LTADKLPQ I KDANDLPEDLGEK YG 

500 510 520 530 540 550 560 



GO 70 :?.0 90 100 110 120 

DICGSSAGFLRAWPEFFGSRi<ftLA!i"KAVi?GLRAR AflGVRT I VDVSTFD I GRDVSLLAEVSRAADVH I VA 

i t it it : i • i t i it i t i t 

till t » i i r i i t ii iii i 

DMMLSVGGF AOVFPF HK.FM I "LRURGYTC AMTGDGVNDAPALKRADV — G I AVHGATDAARAA 

570 580 5SO BOO 610 620 630 



130 140 ISO 160 170 
ATGLWFDPPLSMRLR YVE L-LTSFFL RF 1 9 YG I Er'DT G I RAG 1 1 KVATTGK A TPFQ E 

i iii ii iii iii i iii ii 

i iii ii iii iii i iii ii 

ADMVLTEPGLS WVEAMLVSREVF0P.MLSFLTYR I SATLGLVCFFF I ACFSLTPK AYGS VDPHFQFFHL 

640 650 660 670 680 690 700 



180 190 200 21 0 220 230 240 

L VLK AAARASLATGVPVT THTAASOROeEPGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLRGYL I G 

ii iii t i it i i t till 

ii tit ii It it i till 

PVLMFMLI TLLNDGCLMT I GYDHV T PS ERPOKWNL-PWFVS AS I L A A V ACGSSLM 

710 720 730 740 750 



250 260 270 280 290 30O . 310 

LDH I PHSA I GLEDN ASASPLL6 I RSW9TR ALL IKALI DQGYMKG I LVSNDWLFGFSS YVTN I MDVMDR 

i i i i i t iii i i t i i iii iii 

i i i i i i iii i i iii (it iii 

L LW I GLE GYSS9YYENSWFHRLGLAQLPQGKLVTMMYLK- I S I S-DFLTLFSSRTGGHFFFYMP 

760 770 730 790 800 810 



320 330 340 X 

VNP — DGMAF I PLRV I PFYERRAS — HRKRCQASL 

l lilt ii i i i 

t iiit ii t i t 

PSP I LFCGA 1 1 SLLV STMAASFWHKSRPDNVLTEGLAWGQTN 

820 830 S40 850 SGO 



10. L0W344-FIG1. PEP 

WFHUM Mull an an 



inhibiting -factor precursor - Human 



ENTRY 
TITLE 

ALTERNATE— NAME 
DATE 

PLACEMENT 
SOURCE ' 
ACCESSION 
REFERENCE 
^Authors 



Journal 
COMMENT 



COMMENT 



GENETIC 

ttlntrons 
SUPERFAM I L Y 
KEYWORDS 



FEATURE 
1-25 



WFHUM X-Typ-a Protein 

Mullerian inhibiting -factor precursor - Human 
Mulleri^n inhibiting substance < MIS) 

13-Aug-1336 ^Sequence i3~Aug-138S #Text 30-Jun-13S7 



O 

man 



1. O 



the DNA sequence) 

Hess ion C, * Tizard R. 
Ninfa En G. 



536. O 5. O LO 1. 
Ho mo sap i e ns ftCommon— name 
AO 1 397 

(Sequence translated -froirn 

Cate R. i_. * Mattaliano R. J„ 

Farber N. M. 9 Cheung A. , Ninfa E D G. » Frey A, Z„ 
Gash Do J B 5 Chow E= P. ? Fisher R n A, ■ Bertonis J„ M. * 
Torres G, v Wa liner B„ P. » Ramachandran K. L, * Rag in 
R 0 C n 9 Men^s HrJ.ro T- F„ <* MacLaughlin D„ T. » Donahoe 
P. K, 

Cel) (1 386 ) 45 > 685-G9S 

This protein is homologous to transforming growth 
-Factor bete, inhibin alpha chain? and inhibin A 
and B chains. The area o-f best homology 
corresponds to thf? mature proteins. 

Although it does not compete with EGF -for receptor 
binding sites* MIS can inhibit the 
autophosphory lat ion o-f the EGF receptor in vitro, 

138/1. 185/3, 222/19 275/2 
#Nama inhibin 

testicular v.jiycoprotein\ gonadal di -fferent iat ion\ 
antitumor agant\ Mullerian duct\ TGF-beta homolog\ 
inhibin homo log 

ivDC'Vtir?. i vi signal and propeptide sequence 



26-560 -y* z vct -;cr. : n Mu 1 I er 1 an i nh i b 1 1 1 ng -factor 

<MAT>\ 

64 » 329 ^Binding-site carbohydrate <Asn) 

< potent ial ) 

SUMMARY tfMolecular-weight 59192 ttLength 560 ^Checksum 3812 

SEQUENCE 

Initial Score = 13 Optimised Score » 68 Significance = 3-87 

Residue Identity «= 24%' Matches » 99 Mismatches » 211 

Gaps = 92 Conservative Substitutions = 0 



X 10 20 30 40 50 

MGTRR WL-KS A A AGTLLG-GLA GCATWL— DRSAGA- I GS I RARP I T- I SEAGFTL 

iiit it i tit i i ii ii i i 

tiii ii tiii i i ti ii i i 

LPGAQSLCPSRDTRYLVLAVORPftilAWRGSGLPiLTLOPRGEDSRLSTARLQPiLLFGDDHRCFTRMTPALLLL 

130 200 210 220 230 240 250 



60 70 80 90 100 110 

THED I CGSSA-GFLRAWPEFFGSRK ALAE K AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAEVSR-AA 

i i t ( it it it i iii 

til i it it it ■ iii 

PRSEPAPLPAHGQLDTVPFFPPRPSAELEESPPSADPFLETLTRLVR ALRVPPARASAPRLAL 

260 270 280 230 300 310 



120 130 140 ISO 160 170 ISO 

DVH I VA — ATGL — WFDPPLSMRLRYVEELTQFFLRE I QYGIEDTG I RAG 1 1 K VATTGK ATPFQELV l_K 

i i ii ii t t ii t t iii ii ii 

i i t i it iiit ii tit ii ii 

DPDALAGFPBGLVNLSDPAALERLLDGEEPLLLLLR PTAATTGDPAPLHDPTSAPWATALARRVAAELQ 

320 330 340 350 360 370 380 



190 200 210 220 230 240 

AAA — RASLATGVPVTTHTAA3QRDGERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTAL — LRGYL I GLD- 

iii ii t t i i ii i ii i iii it ii 

iii it ii t t it i ii i tit it ti 

AAAAELRSLPGLPPATAPLLft RLLALCP-GGP GGLGDPLRALLLLKALQGLRVEWRGRDP 

390 400 410 420 430 440 

250 260 270 280 290 300 310 

H I PHS A- I GLEDN AS ASP-LLG I RSWSTRA LL I KAL I DQGYMKQ I L VSNDWLFGFSSYVTN I MD V 

ii i i t t i t ii iii i i 

ii t it t it it iii i t 

RGPGRA9RSAGATAADGPCALREL SVDLRAERSVL I PET Y6ANNCQG — VCGWPQSDRNPRY6 

450 460 470 480 430 500 



320 330 340 X 

MDR VNPDGM AF I PLRV I PFY ER-RASHRKRCQ ASL 

i i i t i i tilt i 

i i iiit iiit i 

NH WLLLKM9 ARG AAL ARPPCC VPTAY AGK! . L I SLSEER I SAHHVPNM V ATECGCR 
510 520 530 540 550 X 560 



Results file low344-f igl -spt res made by maryh on Wed 17 Apr 91 1 1 •• 34 •• 1 2-PDT. 



Query sequence being compared! 
Number of sequences searched? 
Number of scores above cutoff: 



L0W344-FIQ1. PEP 

15409 
375S 



Results of the initial comparison of LOW344— FIG1. PEP with! 
Data bank = Swiss— Prot 14» all entries 

100000- 

N 

U50000- 

M 

B 

E 

R 

0 -# 
F10000- 

S 

E 5000- 

0 - # 

U 

E 

N 

C 

E 

S 1000- 

* * 
500- 



100- 



50- * 



10- * 



5- 



— * 
0 

i i i i t t i i i t l l l i i 

i i i i i i i i i i i i i i i 

SCORE O: '. 125! 50 78 lOl 12G 151 177 202 227 

STDEV -1 9 



PARAMETERS 



Similarity matrix Unitary 

Mismatch penalty 1 

Gap penalty 1. 00 

Gap size penalty 0.05 

Cutoff score S 

Randomization group 0 



K -tuple 

Joining penalty 
Window size 



2 
20 
32 



Initial scores to save 20 
Optimized scores to save 20 



Alignments to save 
Display context 



10 
lO 



SEARCH STATISTICS 



Scores - 



Mean 
6 



Median Standard Deviation 

7 2, 69 



T i mes : 



CPU 

00 a 03 s 1 5, 00 



Total Elapsed 
OOs 13*53. 00 



Number of residues: 4914263 
Number of sequences searched- 15409 
Number of scores above cutoffs 375S 

Cut-off raised to 7 D 
Cut-off raised to S a 
Cut-off raised to 9. 

The scores below are sorted :->y initial score. 
Significance is calculated bssed on initial score„ 



A 100% identical sequence to the query sequence was not found. 



The list of best scores is* 

Inlt. Opt. 

Sequence Name Description Length Score Score Sig- Frame 







a xxx 82 3- ten ;dai-ci deviations 


above mean 










1. 


□PDSSPSEDI 


PH0SPHUTRIU3TEKA3E ( EC 3. 5. - 


325 227 


251 


82. 28 


0 






4 standard deviations 


above mean 










2. 


MTC3SCHVN1 


MODIFICATION McTI -!YL ASE CVIB II 


377 


19 


25 


4. 84 


0 


3. 


SYY33BACCA 


TYR0SYL— TRNA SYNTHETASE < EC 6, 


419 


19 


43 


4. 84 


0 


4. 


VGLXSPRV 


SFCRETED GLYCOPROTEIN GX < GENE 


438 


19 


47 


4. 84 


0 


5. 


ENOGSRAT 


GAMMA EMOLASE ( PC 4. 2. 1 . 1 1 ) (2 


433 


18 


63 


4. 47 


0 



6. 


ENOG8HUMAN 


GAMMA f iNDLASE (EC 4.2, 1. 11) (2 


433 


18 




4. 


47 


O 


7. 


ATXASLEIDO 


PROBABLE E1-E2 TYPE CATION ATP 


974 


18 


68 


4. 


47 


0 


8. 


ICENSERWHE 


ICE NUCLEAT I ON PROTEIN < GENE N 


1258 


18 


63 


4. 


47 


0 


9. 


ATXBSLEIDO 


PROBABLE E5.-E2 TYPE CATION ATP 


974 


18 


68 


4. 


47 


0 


10. 


H31STETPY 


HI STONE H3. 1. 


135 


17 


27 


4. 


lO 


0 


1 1. 


KCCASRAT 


CALC I UM/CALMODUL I N-DEPENDENT P 


478 


17 


57 


4. 


10 


0 


12. 


KCCAJ6M0USE 


CALC I UM/CALMODUL I N-DEPENDENT P 


478 


17 


57 


4. 


10 


0 


13. 


ICENSPSESY 


ICE NUCLEAT I ON PROTEIN (GENE N 


1200 


17 


63 


4. 


10 


0 


14. 


CHA2SB0MM0 


CHORION CLASS A PROTEIN L12 PR 


132 


17 


28 


4. 


10 


0 


15. 


Y206SLAMBD 


HYPOTHETICAL PROTEIN 0RF206. 


206 


17 


36 


4. 


10 


0 






3 standard deviations above mean 


# At X. it 










16. 


ARSASHUMAN 


ARYLSULFATASE A PRECURSOR < EC 


507 


16 


64 


3. 


72 


0 


17. 


HMEVSDROME 


SEGMENTATION PROTEIN EVEN-SKIP 


376 


16 


35 


3. 


72 


0 


18. 


GBB INHUMAN 


GUANINE NUCLEOT I DE-BINDING PRO 


340 


16 


40 


3. 


72 


0 


19. 


H32SXENLA 


HI STONE H3. 2. 


135 


16 


30 


3. 


72 


0 


20. 


CD14SM0USE 


CD 14 DIFFERENTIATION ANTIGEN P 


366 


16 


61 


3. 


72 


0 



The scores below are sorted by optimized score. 
Significance is calculated based on optimized score» 

A 100% identical sequence to the query sequence was not -found- 



The list o-f best scores i s 



Sequence Name 



Descr ipt ion 



In it- Opt. 
Length Score Score 



Sig, Frame 



1. OPDSPSEDI 

2- YHL1SHCMVA 

3. YQR1$HCMVA 

4. CYAASBORPE 

5. CAP 1 SMESCR 



76 standard deviations 
PHOSPHOTR I ESTERASE (EC 3. 5. » - 

4 standard deviations 
HYPOTHETICAL PROTEIN HHLF1. 
HYPOTHETICAL PROTEIN HQRF1. 



above mean 

325 227 

above mean 
783 
846 



251 76.20 



CALMODUL I N-SENS I T I VE ADENYLATE 1 706 
PHOSPHOENOLPYRUVATE CARBQXYLAS 966 

3 standard deviations above mean 



9 
9 
10 
3 



70 
69 
69 
69 



4. 74 
4. 34 
4. 34 
4. 34 



O 
O 
O 
O 



6. 


MISSHUMAN 


MULLERIAN INHIBITING FACTOR PR 


560 


13 


68 


3. 


95 


0 


7. 


ATXA33LEIDO 


PROBABLE E1-E2 TYPE CATION ATP 


974 


18 


68 


3. 


95 


0 


8. 


POLG33WNV 


GENOME POLYPRDTEIN (CAPS ID PRO 


3430 


10 


68 


3. 


95 


0 


9. 


ATXB33LEIDO 


PROBABLE E1-E2 TYPE CATION ATP 


974 


18 


68 


3. 


95 


0 


10. 


ATP0*0ENBI 


ATP SYNTHASE ALPHA CHAIN, MI TO 


51 1 


9 


68 


3. 


95 


0 


1 1. 


PQLG$P0L2L 


GENOME POLYPRDTEIN (COAT PROTE 


2207 


13 


68 


3. 


95 


0 


12. 


PGKH33WHEAT 


PHOSPHOGLYCERATE KINASE, CHLOR 


480 


8 


68 


3. 


95 


0 


13. 


EXON33HSV 1 1 


ALKALINE EXONUCLEASE (EC 3. 1. 1 


626 


9 


67 


3. 


55 


O 


14. 


KGPSBQVIN 


CGMP— DEPENDENT PROTEIN KINASE 


670 


1 1 


67 


3. 


55 


0 


15. 


KGPB93HUMAN 


CGMP— DEPENDENT PROTEIN KINASE? 


686 


1 1 


67 


3. 


55 


0 


16. 


VGL333PRV 


GLYCOPROTEIN GUI PRECURSOR. 


479 


9 


67 


3. 


55 


0 


17. 


PHYB33ARATH 


PHYTOCHROME B (GENE NAME * PHYB 


1 172 


9 


67 


3. 


55 


0 


18. 


ATI 1SJSHSV1 1 


ALPHA TRANS- INDUCING FACTOR 73 


693 


9 


67 


3. 


55 


0 


19. 


PGCASRAT 


CARTILAGE-SPECIFIC PROTEOGL YC A 


2124 


9 


67 


3. 


55 


0 


20. 


PYR1S3YEAST 


CARBAMOYL-PHOSPHATE SYNTHETASE 


1456 


1 1 


67 


3. 


55 


0 



1. L0W344-FIG1, 
OPDSPSEDI 



PEP 

PHOSPHOTR I ESTERASE 



( EC 3. 5, 



•) (GENE NAME: OPD). 



ID 
AC 
DT 
DT 
DT 
DE 
OS 
OG 
OC 
OC 



OPDSPSEDI 
PI 3739; 
01-JAN-1990 
01 -JAN- 1990 
01 -APR- 1990 



STANDARD 5 



PRT ! 



325 AA. 



( REL. 1 3 , 
( REL. 1 3 , 
(REL. 14, 
PHOSPHOTR I ESTERASE (EC 
PSEUDOMONAS DIM I NUT A. 
PLASM ID PCMS1. 

PROKARYOTA; BACTERIA; GRAM -NEGATIVE 
PSEUDOMONADACEAE. 



CREATED) 

LAST SEQUENCE UPDATE) 
LAST ANNOTATION UPDATE) 
3. 5. - > ( GENE NAME « OPD ) . 



AEROBIC RODS AND COCCI ; 



RN [1] (STRAIN MG» SBJUENlii H3LM N. A. > 

RA MCDANIEL C. S. , HARPER L„ L. • WILD J. R„ J 

RL J. BACTERIOL. 1 70 = 230S-231 1 < 1988 ) . 

CC -!- PATHWAY ' PESTICIDE DETOX I F X CAT I CN. 

CC -!- SUBCELLULAR LOCATION = MEMBRANE— ASSOC I ATED. 

DR EMBL5 M20392I PPPTE. 

DR PIR; A28214 5 A23214. 

KW PLASMID? HYDROLASE % MEMBRANE. 

SQ SEQUENCE 325 AA? 35583 MWJ 52S207 CN? 



Initial Score = 227 Optimised Score. = 251 Significance = 76. 20 
Residue Identity = 76% Matches = 260 Mismatches = 51 

Gaps = 28 Conservative Substitutions = O 

X 10 20 30 40 50 60 70 

MQTRR WLKS AAAGTLLGGL AGCA FWLDRSASA I GS I R ARP I T I SEAGFTLTHED I CGSS AGFLRAWPEFFG 

i t i i i t i i i i i i i i i t i i i t i t i t t : t i i i i i t i t t i t t i i i t i i i i i i t t t i ii 

■ t i i ■ i i i i i i i i i i i t t i i t i i i i i i i i t i i > i i t i i i t i i i i i i ■ i i t i i t ii 

MQTRR WLK S AAARTLLGGLAGC ATWLDRSAQ AMRS I R ARP I T I SEAGFTLTHED I S AARQDSC VLGQS 

X 10 20 30 40 50 60 



80 30 100 110 120 130 

SRKALAEKAVRGLR— ARAAGVR T I VD VSTFD IGRDVSLL.AEVSRAADVH I VAATGLWFDPPLSM 

i i tit it .ti til i i i i i i i i t i i » i » i 

t i lit t t ii tit i i i i i i i i i i i i i i ■ 

SSVA9SSSGKGCER I ARQSGWRANDCRCVDFRYRSRRQF I GR GFAGCRR S YLAATGLWFDPPLSM 

70 80 30 100 110 120 130 



140 150 ISO 170 180 190 200 

RLRYVEELT QFFLRE 7. QYG I EDTG I RAG 1 1 KVATTGK ATPFQELVLKAAARASLATGVPVTTHTAAS 

i i i i t i i i i • i t t i i i i i i i i i t i t i i t i i t i i i i i i i i i i t i > i i i ■ t i i i i t i t 

tiiitiitt i i i t i t i i t i » i i t t t i t i i i i i i i i t i i i t i t i t i i t i t i i i i i t t 

RLRYVEELTLVLPAVRFNMASK Y TG I RAG I I KVATTGK ATPFQELVLKAAARASLATGVPVTTHTAAS 

140 150 ISO 170 180 190 200 



210 220 230 240 250 260 270 

QRDGERGRPPFLSPKLEPSRVO I G! ISDDTDDLSYLTALLRGYL I GL.DH I PHSA I GLEDNASASPLLG I RSWQ 

i i i t i i i i i i i i < i i t • i i t i i i i t i i t t i i t i i i t t i i \ i i i i i i i i • i i i i i i i i i i i • i i t i i » i i i i i 
i i i i i i i i i i i i i i i i i i i i i c i i i • l i < i i t • i i i t i i i i i t i i i i i i t i t t i i i • i i t i i i i t i i i i i i t 

QRDGERGRPPFLSPKLEPSR VC I G* ISDDTDDLSYLTALLRGYL I GL DH I PHSA I GLEDNASASPLLG I RSWQ 
210 220 230 240 250 260 270 

280 290 300 310 320 X 330 

TR ALL I K AL I DGGYMKQ I LV3NDWI .FGFSSY VTN I MDVMDRVNPDGMAF I PLRV I PF YERR 

I I I I I I I I I I 1 I T I I I I I I I I I I I I I I I I I I t I I I I t I I I t 1 I I I t I I t I 
I I I I t I I I I I I t I I I I I I I I I I I I I I t I I I I I I I I I t t t I I I t I I I I I I I 

TRALL I K AL I DGGYMK0 I LVSNDWLFGFSSYVTN I MDVMDRVNPDGMAF I H 
280 2S0 30O 310 320 X 



2. L0W344-F I G 1 . PEP 

YHL1SHCMVA HYPOTHETICAL PROTEIN HHLF1. 



ID YHL1SHCMVA ST AND ARC P PRT > 788 AA. 

AC P09695 ; 

DT 01 -MAR- 1989 < REL. 10* CREATED) 

DT 01 -MAR- 1989 ( REL. 10, LAST SEQUENCE UPDATE) 

DT 01— JAN— 1990 (REL. 13, LAST ANNOTATION UPDATE) 

DE HYPOTHETICAL PROTEIN HHLF1. 

OS HUMAN CYTOMEGALOVIRUS (STRAIN AD 169). 

OC VIRIDAE s DS-DNA ENVELOPED VIRUSES s HERPESV I R I D AE. 

RN [ 1 3 ( SEQUENCE FROM N. A > 

RA WESTON K. , BARRELL. B. G. % 

RL J. MOL= BIOL. 192 s 177- '20S< 1986) . 

CC -I- SIMILARITY • TO HWL!" 1 , HHLF5, HHLF6 1 HHLF7, AND HQRF1. 

DR EMBL5 X04630I HEHCMVU. 

DR PIR? C27349S QSBEE3. 

KW HYPOTHETICAL PROTEIN. 

FT CARBOHYD HZ 76 POTENTIAL. 

FT CARBOHYD 110 J18 POTENTIAL. 

FT CARBOHYD 223 223 POTENTIAL. 



SQ SEQUENCE AA5 to'^l ciW* ;2b64523 DM? 



Initial Score = '3 Optimized Score = 70 Significance =■ 4.74 

Residue Identity = 23% Matches « 90 Mismatches = 241 

Saps 5'J Conservative Substitutions - o 

X 10 20 30 40 50 

M9TRRWLKSAAAGTLLGGLAGCATWLDRSA-9A I G S I RARP I T I SEAGFTLTHED 

ii i(i ( t i lit tl lit 

it til iti ii« ii iii 

ASAPHPASLLTAVRRHLNQRi.CCGULALGAVLPARWLGCAAGPATGTAAGTTSPPAASGTETEAAGGDAPCA 
330 X 340 350 360 370 330 330 

60 70 SO 30 100 110 120 

, I CG — SSAGFLRAWPEFFGSRK A LA-EK AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAEVSRAADVH 

ii it i i ) i i i i i i ■ it i i tilt 

t i ii i i ■ it i i t i i it t i iiii 

I AG A VGS AVP VPPQPYG AAGGG A I C VPNADAH A VVG ADA AA AAAPT VM VGST AM AGPAAS — GTVPRAMLW 
400 410 420 430 440 450 460 

130 140 150 160 170 180 
I V A ATGLWF-DPPLSMRLR YVEELTQFFLRE IQYGI EDTG- 1 RAG I IK VATTGKATPFQE — ; LVLKA 

llll t I I I < I 1 I I 

till 1111(1 I I I 

LLDELG A VFGYCPLDGH V YPi . A AEt .SHFI. .RAGVLGAL ALGRESAP A AE AARRLLPELDREQWERPRWDALHL 
470 480 490 500 510 520 530 

130 200 210 220 230 240 250 

AAR ASLATGVP VTTHT A ASQRDGERGRPPFLSPKLEPSRVC I GHSDDT DDLSYLT ALLR — GYL I GLD — H I 

lit i i i i t i i i ii it ill 

tit i t i i i i i i it it iii 

HPR A AL W AREP-HG9L AFLLRPG— RGE AEVLTL ATKHPA I CANVEDYLGD ARRRADABALGLDLATV 

540 550 560 570 580 530 600 

260 270 280 230 300 310 

PHS A I G LEDNAS ASPLLG I RSWQTRALL I K AL I DQGYMK Q I L VSNDWLFGFSSY VTN I MDVMD 

it i i i ii ii ii I i 

t I (till ii it I I 

VMEAGGQMIHKKTKKPKGKEDESLMK.GKHSRYTR— PTEPPLTPQASLGRALRRDDEDWKPS RLPGED 

610 620 630 640 650 660 

320 330 340 X 

RVNPDGMAF I PL RV I PFYERR— ASHRK RC9 ASL 

t i I t>i it 

it t t ii it 

SWYDLDETFWVLGSNRKNDVYGRRWKKTVLRCGLE I DRPMPTVPKG 
670 680 630 700 X 710 



5. L0W344-FIG1. PEP 

YQR135HCMVA HYPOTHETICAL PROTEIN H0RF1. 

ID YQRISHCMVA STANDARD 5 PRT ? 846 AA. 

AC P03715? 

DT 01 -MAR- 1383 (REL. 10, CREATED) 

DT Ol -MAR- 1983 (REL. 10, LAST SEQUENCE UPDATE) 

DT 01-JAN-1930 (REL. 13, LAST ANNOTATION UPDATE) 

DE HYPOTHETICAL PROTEIN HCRFi. 

OS HUMAN CYTOMEGALOVIRUS (STRAIN AD 169). 

OC VIRIDAE? DS-DNA ENVELOPED VIRUSES? HERPESV I R I DAE. 

RN [ 1 ] ( SEQUENCE FROM N- A= ) 

RA WESTON K. , BARRELL B. G, ; 

RL J. MOL. BIOL. 1 92 • 1 77-'?OS ( 1 3S6 ) . 

CC -!- SIMILARITY " TO HWLF1 , HHLF1 , HHLF5 » HHLF6, AND HHLF7. 

DR EMBLl X04650 5 HEHCMVU. 

DR PIR; C2607GS GQEEC3. 

KW HYPOTHETICAL PROTEIN. 

FT CARBOHYD 7G 76 POTENTIAL. 

FT CARBOHYD 113 118 POTENTIAL. 

FT CARBOHYD 223 223 POTENTIAL, 

SQ SEQUENCE 846 AA 5 91047 MW? 3448605 CNS 



Initial Score = K 3 Qjrc : nu zed Score - 69 Significance = 4.34 

Residue Identity = 22;o Matches - S6 Mismatches = 248 

Gaps = 4G Conservative Substitutions .« 0 

X 10 20 30 40 50 

MQTRRWLK SAAAGTLLGGL AGC ATWLDRSA-QA I G S I RARP I T I SEAGFTLTHED 

ii lit i * i it* ii i>t 

i » » i i iii tit ii tit 

ASAPHPASLLTAVRRHLN9RLCCGWLALGAVLPARWLGCAAGPATGTAAGTTSPPAASGTETEAAGGDAPCA 

330 X 340 350 360 370 380 390 

60 70 SO SO 1 OO HO 1 20 

I CG — SSAGFLRAWPEFFGSRK A LA-EK AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAEVSRAADVH 

ii ii i i t t i i ) i i i ii i i iiii 

it ii i i t ii i i ii i ii i i iii i 

I AGAVGSAVPVPPGPYGAAGGGA I CVFNADAHAV VGADAAAAAAPTVMVGSTAMAGPAAS — GTVPRAML VV 
400 410 420 430 440 450 460 

130 140 150 ISO 170 180 
I V A ATGL WF— DPPLSMRLR Y VEEL TOFFLRE I QYG I EDTG- I RAG I I K VATTGK ATPFQE LVLKA 

till ill i i i i i i 

iiit iii t i i i t i 

LLDELGAVFGYCPLDGHVYPLAAELSHFLRAGVLGALAL GRESAPAAEAARRLLPELDREQWERPRWDALHL 
470 480 490 500 510 520 530 

190 200 210 220 230 240 250 

A ARASL ATG VPVTTHT A ASQRDGE-RGRPPFLSPKLEPSRVC I GHSDDTDDLS YLT ALLRGYL I GLDH I PHS 



i i t 



HPRAALWAREP HGQWEFMFREQRGDP I NDPLAFRLSDARTLGLDLTT VMTERQSQLPEK Y I GFYQ I RKP 

540 550 5S0 570 5S0 530 600 

2GO 270 280 230 300 310 

A I GLEDN AS ASFLLG I RSW9T N ALL I KAL I DGGYMKQ I L VSNDWLFGFSS YV TN I MDVMDRVNP 

t iiti it ii ii t i 

• iiit t i it ii ii 

PWLME QPPPPSR9TKPDAATMPPPLSAQASVSYALRYDDESWRPLSTVDDHKAWLDLDESHWVLG 

610 820 630 640 650 660 

320 330 340 X 

D— GMAF IPLR-VI PF YERR ASHRK RCGASL 

i it t i ( 

i ii ii i 

DSRPDD I KQRRLLK ATORRGAEI DRPMP WPEECYDQRFT 
670 680 690 700 



k LOW344-FIG1. PEP 

CYAASBORPE CALMODUL I N-SENS I T I VE ADENYLATE CYCLASE PRECURSOR ( 

ID CYAASBORPE STANDARD I PRTl 170B AA. 

AC P15318? 

DT 01 -APR- 1990 ( REL. 14, CREATED) 

DT 01 -APR- 1990 ( REL. 14, LAST SEQUENCE UPDATE) 

DT 01 -APR- 1990 (REL, 14, LAST ANNOTATION UPDATE) 

DE CALMODUL IN-SENSITIVE ADENYLATE CYCLASE PRECURSOR (EC 4.6. 1. 1) 

DE (ATP PYROPHOSPHATE-LYASE) ( ADENYLYL CYCLASE) (CYCLOLYSIN) ( CONTAINS s 

DE HEMOLYSIN) (GENE NAME • CYA)„ 

OS BORDETELLA PERTUSSIS. 

OC PROK ARYOTA P BACTERIA? GRAM -NEGATIVE AEROBIC RODS AND COCCI » UNCERTAIN. 

RN [1] (STRAIN 18323, SEQUENCE FROM N, A. ) 

RA GLASER P. , LAD ANT D„ , >3EZER O. , PICHOT F. , ULLMANN A. , DANCHIN A. ; 

RL MOL. MICROBIOL, 2=19-30(1988). 

CC -!- FUNCTION » THIS ADENYLATE CYCLASE BELONGS TO A SPECIAL CLASS OF 
CC BACTERIAL TOXIN, IT ACTS ON MAMMALIAN CELLS BY ELEVATING CAMP- 

CC CONCENTRATION AND THUS DISRUPTS NORMAL CELL FUNCTION. 

CC -!- CATALYTIC ACTIVITY r ATP 3,' 5' -CYCLIC AMP + PYROPHOSPHATE. 

CC -!- SUBCELLULAR LOCATION: RELEASED EXTRACELLULARLY IN A PROCESSED 
CC FORM. 

CC -!- DISEASE^ WHOOPING COUGH, 

DR EMBL5 Y005451 BPCYA, 

DR PIR5 S00893? S00893 



DR 
K.W 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SQ 



PROS I TE J PG00350 f i -EKO -V S i N_CALC I UM. 

lyase; camp synthesis; hemolysis; toxin; virulence; whooping cough; 
calc i um-b i nd i ng ; repeat,, 

CHAIN 1 1706 CALMODUL IN-SENSITIVE ADENYLATE 

CYCLASE PRECURSOR. 
CHAIN 1 ? CALMODUL IN-SENSITIVE ADENYLATE 

CYCLASE. 

CHAIN 1 1300 HEMOLYSIN > BY SIMILARITY TO E. COL I 

HEMOLYSINS (HYLA). 
DOMAIN 1 3S0 A, CALMODULIN-SENSITIVE CATALYTIC 

CENTRE. 

361 912 B, ALA/GLY RICH. 

913 1656 C. 
1657 1706 D, ASP/GLY RICH. 



DOMAIN 
DOMAIN 
DOMAIN 
SEQUENCE 



1706 AA; 177505 MW! 1. 25518E+07 CN; 



Initial Score 
Residue Identity 
Gaps 



1 0 Opt i m i zed Score 63 S i gn i -f i cance = 4. 34 

23% Matches = 93 Mismatches = 206 

97 Conservative Substitutions = O 



X 10 20 30 

MQTRRWLKSAAAGTLLGGLAGCATWLDRSAQA I GS I R- 



40 

-ARPITI SEA 



LMTQFGRAGSTNTPQEAASLSAAVFGL— GEASSAVAETVSGFFRGSSRWAGGFGVAGGAMALGGGIAAAVGA 
490 500 510 520 530 540 550 



50 60 70 

GFTLTHED I CG — SS AGFLRAWPEFFG- 



80 90 100 110 

-SRK ALAFK A VRGLRARAAGVRT I VDVSTFD I GRDVSLLAE 



GMSLTDDAPAGQK AA AGAE I ALOLTGGTVELASS I ALA- 
560 570 580 



— LAAARGVTSGLQVAGASAG- 
590 600 



120 130 140 150 

VSRAADVH I VAATGLWFDPPLSMRLR Y VEELTQFFLRE I 9YG I E- 



160 170 
-DTGIRAGI IKVATT 



AAAGALAAALSPMEIYGLVQQSHYADQLDKLAQESSAYGYEGDALLAQLYRDKTAAEGAVAGVSAVLST 

610 620 630 640 650 660 670 



180 190 200 

GKATPF9ELVLKAAARASLATGVPVTTHTAASQRDG- 



210 220 230 

-ERGRPPF LSPKLEPSR VC I GHSDDTDDLSYL 



VGA AVS I AAA-AS-VVGAPVA VVT — SLLTGALNG I LRGVQQP I I EKL- 

680 690 700 710 720 



-AND YARK I DELGGP 
730 



240 250 260 270 280 290 300 

T A LLRGYL I GLDH I PHSA I GLEDNASASPLLG I RSWOTRALL I KALI DQGYMKS I LVSNDWLFGF 



QAYFEKNLQARHEQLANSDGLRKMLADLOAGWNASSV I G- 
740 750 760 770 



-VQTTE I SKSAL ELA A I TGNADNL — K 

780 790 



310 320 330 340 X 

SSYVTN I MDVMDRVNPDGMAF I P — LRV I PFYERR ASHRK RCO ASL 



SVDV F VDRF VQGERV AGOP VVLDVA AGG I D I AS-RKGERP ALTF I TPLAAPG 

800 810 820 830 840 



:>. L0W344-FIG1. PEP 

CAP1SMESCR PHOSPHOENOLPYRUVATE CARBOXYLASE 1 (EC 4. 1. 1.31) (G 

ID CAP1SMESCR STANDARD 5 PRT? 366 AA. 

AC PI 0490; 

DT 01— JUL— 1989 (REL. 11, CREATED) 

DT 01— JUL— 1989 < REL. 11, LAST SEQUENCE UPDATE) 

DT 01 -APR- 1990 (REL. 14, LAST ANNOTATION UPDATE) 

DE PHOSPHOENOLPYRUVATE CARBOXYLASE 1 (EC 4. 1. 1.31) (GENE NAME s PPCA), 



OS COMMON ICE PLANT < MESE^SPYANTi-fclMUM CRYSTALLINUM). 

□C EUKARYOTA? PLANTA » SPEFiMATOPHYTA I ANG I OSPERMAE. 

RN C 1 ] < SEQUENCE FROM N. A. > 

RA RICKERS J. . CUGHMAN J. , MICHALOWSKI C. . SCHMITT J. , BOHNERT H. J. ; 

RL MOL. GEN. GENET. 2 1 5 » 447-454 < 15389). 

RN [23 < SEQUENCE FROM N. A. > 

RA CUSHMAN J. C. , BOHNERT H. J. J 

RL NUCLEIC ACIDS RES. 1 7 = 6745-6745 < 1 989 ) . 

CC -!- FUNCTION s TO FORM OX ALG ACETATE ■> A FOUR— CARBON DICARBOXYLIC ACID 
CC SOURCE FOR THE TRICARBOXYLIC ACID CYCLE. 

CC -!- CATALYTIC ACTIVITY: ORTHOPHOSPHATE + OXALOACETATE = H(2)0 + 

CC PHOSPHOENOLPYRUVATE + C0<2>. 

CC -!- PATHWAY: TRICARBOXYLIC ACID CYCLE. 

DR EMBL; X13SSO; MCPPCR. 

DR EMBL5 XI 4587? MCPPCA. 

KW LYASE? CARBON DIOXIDE FIXATION; ALLOSTERIC ENZYME; 

KW TRICARBOXYLIC ACID CYCLE. 

SQ SEQUENCE 366 AAl 110659 MW5 4690045 CN; 



Initial Score == 8 Optimised Score = 69 Significance = 4.34 

Residue Identity = 23% Matches = 94 Mismatches = 212 

Gaps = 90 Conservative Substitutions = O 

X lO 20 30 40 50 

MOTRRVVLKSAAAGTLLQGLAGCATWLDRSABA I GS I R ARP I T I SEAGFTLTHED I 

t i liii t i i it i t i 

i i it ii it i ti iii 

S VRRSLLQK HGR I RDCLAQLY AKD I TPDDK. QELDE ALQRE I QAAFRTDE I RRTQPTP9DEMRAGMSYFHET I 
180 190 200 210 220 230 240 

60 70 80 90 lOO 110 

CGSSAGFLRAWPEFFGSRKALAEKAVRGLRAR A AG VRT I VDVSTFD I GRD VSLL AEVSR 

lit i it i i it i iii iii 

III I l| | I < I I It! Ill 

WNGVPKFLR RLDTALK— N I G I TERVPYNAPL I QFSSWMGGDRDGNPRVTPEVTRDVCLLA-RMM 

250 260 270 280 290 300 310 



120 130 1.40 ISO 160 170 

AADVH I VAATGLWFDPPLSM RLR-YVEELTGFFLRE I QYG I EDTG I RAG I I K VATTGK ATPFQE — L 

tt i i I i t it iii i ii iii 

it t i tii t t iii t it iii 

AANMYFSQ I DELMF — ELSMWRCTDELRERAEELHKYSKRDSKHY I E FWKQ I PSSEPYR 

320 330 340 350 360 



180 190 200 210 220 230 

VLKA AARASLATGV P V— TTHT AASQRDGERGRPPFLSP— KLEPSRVC I GHSDDTDDLS 

ii I it i ii it i i I I I i it 

ii i i * t i t i i i iitt i ii 

V I LADVRDKLYYTRERSRQLLASE VSE I PVEATFTE I DQ FLEPLELC YRSLCACGDRPVADGS 

370 380 3S0 400 410 420 430 

240 250 260 270 280 290 300 

YL TALLRGYL I GLDH I PHS A I GLEDN ASASPLLG I RSWQTR ALL I K AL I DQGYMKQ I LVSNDWLFGF 

i i tit i iiit i t > iii 

t i tit t i i i i i i • iii 

LLDFMRQvATFGLCLVKLD I RSESERHTDVMDA I TTHLG I GS — YRDWTEEKRQD — WLLSELRGKRPLFGP 
440 450 460 470 480 490 



310 320 330 340 X 

SSYVT— N I MD VMDRVNPDGM AF I PLR V I PFY-ERR ASHRKRCQASL 

i i i t i t it i i t 

i i t i i i it i i i 

DLPRTDE I ADVLDT I N — V I AELPSDSFGAYV I SM AT APSD VL AVELLQRECK VKK 
5O0 510 520 530 540 X 550 



6. L0W344-FIG1. PEP 

MIS33HUMAN MULLERIAN INHIBITING FACTOR PRECURSOR (MIS). 

ID MISSHUMAN STANDARD 5 PRT? 560 AA. 

AC P039715 

DT 23-OCT-19S6 ( REL, 02, CREATED) 



DT 
DT 
DE 
OS 
OC 
OC 
RN 
RA 
RA 
RA 
RA 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
DR 
KW 
FT 
FT 
FT 
FT 
FT 
SB 



23-0CT-198S ( h?EL„ 02* LAST 
01 -AUG- 1988 (REL. 08, LAST 
MULLERI AN INHIBITING FACTOR 
HUMAN (HOMO SAPIENS). 

eukaryotas metazoa; chdrdata; 
eutheria; primates. 

[13 ( SEQUENCE from N. A. ) 
CATE R. L. , MATTALIANO R. J. . HESS I ON 
CHEUNG A. , MINFA E. G. ■ 
BERTONIS J. M. , TORRES 
RAG IN R. C. , MANGANARO 
CELL 45 i 685-698 ( 19S6). 



SEQUENCE UPDATE) 
ANNOTATION UPDATE) 
PRECURSOR (MIS). 

VERTEBRATA? TETRAPODA; MAMMAL I A ! 



C. 

FREY A. Z. , GASH 
G. , WALLNER B. P. 
T. F, , MCLAUGHLIN 



, T IZARD R. , FARBER N. M. , 
D. J. , CHOW E. P. , FISHER 

, RAMACHANDRAN K. L. , 
D. T. , DONAHOE P. K. ; 



R. A. 



GLYCOPROTEIN, PRODUCED BY THE SERTOLI CELLS OF THE 
REGRESSION OF THE MULLERI AN DUCT. IT ALSO IS ABLE 
GROWTH OF TUMORS DERIVED FROM TISSUES OF MULLERI AN 

OF IDENTICAL CHAINS LINKED BY AN INTERCHAIN 



-!- FUNCTIONS THIS 

TESTIS, CAUSES 

TO INHIBIT THE 

DUCT ORIGIN. 
-!- SUBUNITs DIMER 

DISULFIDE BOND. 

-!- SIMILARITY ' TO TGF-BETA » INHIBIN ALPHA AND BETA CHAINS. 

-!- ALTHOUGH IT DOES NOT COMPETE WITH EGF FOR RECEPTOR BINDING SITES, 

MIS CAN INHIBIT THE AUTOPHOSPHORYLAT I ON OF THE EGF RECEPTOR IN 

VITRO. 

EMBL i K03474? HSMIS. 

PIR; AO 1397? WFHUM. 

PROS I TE 5 PS00250 5 TGF_BETA. 

GROWTH FACTOR I GLYCOPROTEIN? GONADAL DIFFERENTIATION? SIGNAL. 



FACTOR. 



SIGNAL 


1 


18 


PUTATIVE. 




PROPEP 


19 


25 


PUTATIVE. 




CHAIN 


26 


560 


MULLERI AN 


INHIBITING 


CARBOHYD 


64 


64 


POTENTIAL. 




CARBOHYD 


329 


329 


POTENTIAL. 




SEQUENCE 


560 AA? 


59192 


MW? 1428811 


CN? 



Initial Score = 
Residue Identity - 
Gaps 



13 Optimized Score = 68 Significance = 3. 95 

24% Matches - 39 Mismatches « 211 

32 Canserv/at i ve Substitutions - O 



X 10 20 

MGTRRVVL— KSAAAGTLLG—GLA- 



30 40 SO 
GCAT WL-DRSAQ A- I GS I R ARP I T~ I SEAGFTL 



LPGAQSLCPSRDTRYLVLAVDRPAGAWRGSGLALTLQPRGEDSRLSTARLGALLFGDDHRCFTRMTPALLLL 
130 200 210 220 230 240 250 

60 70 30 SO 1 OO HO 

THED I C6SSA-GFLRAWPEFFGSRKALAE K AVRGLRARAAGVRT I VDVSTFD I GRDVSLLAEVSR-AA 



PRSEPAPLFAHGQLDTVPFPPPRPSAELEESPPSADPFLETLTRLVR- 
260 270 280 230 



ALRVPPARASAPRLAL 

300 310 



120 130 140 ISO 160 170 130 

DVH I VA — ATGL — WFDPPLSMRLR Y VEELTQFFLRE I © YG I EDTG I RAG I I K V ATTGK ATPFQEL V LK 

i i t i ii titi t i iii it (i 

• > t i ii ii<i t t i * i it ii 

DPDALAGFPQGLVNLSDPAALERLLDGEEPLLLLLR PTAATTGDPAPLHDPTSAPWATALARRVAAELQ 

320 330 340 350 360 370 380 

190 200 210 220 230 240 

AAA — RASLATGVPVTTHTAASQRDGERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTAL — LRGYL I GLD— 

til ii ii i t t • i ti i iii ii it 

iii ii ii i t ii i t i t ill ii » i 

AAAAELRSLPGLPPATAPLLA RLLALCP— GGP GGLGDPLRALLLLKALQGLRVEWRGRDP 

390 400 410 420 430 440 

250 2GO 270 2SO 290 300 310 

H I PHSA— I GLEDN ASASP-LLG I RSWQTRA LL I KAL I DQGYMKQI L VSND WLFGFSS YVTN I MD V 



RGPGRAQRSAGAT AADGPCALRELSVDLRAERSVL I PET YOANNCQG- 
450 460 *70 480 490 



-VCGWPQSDRNPRYG 
500 



320 330 340 X 

MDR VNPDGM AF I PLRV I PFY ER-RASHRKRCBASL 

i t i t i,i t t it ■ 

i t lit t it ii i 

NHWLLLKM9ARGAALARPPCCVPTAYAGKLLISLSEERISAHHVPNMVATECGCR 
510 520 530 540 550 X 560 



L0W344-FIG1. PEP 

ATXASLEIDO PROBABLE E1-E2 TYPE CATION ATPASE 1A (EC 3.6. l.->. 



ID 
AC 
DT 
DT 
DT 
DE 
OS 
OC 
RN 
RA 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
KW 



ATXASLE I DC- 
PI 17185 
01 -OCT- 1389 
01 -OCT- 1983 
01 -OCT- 1989 



STANDARD ? 



PRTi 



974 AA. 



CREATED) 

LAST SEQUENCE UPDATE) 
LAST ANNOTATION UPDATE) 
1A (EC 3. 6. 1. 



MAST I GOPHORA. 



STRINGER J. R. 



(REL. 12. 
( REL. 1 2 , 
( REL. 1 2 , 

probable e1-e2 type cation atpase 1a (ec 3.6. 1.-). 
leishmania donovan i. 

eukaryota; protozoa; sarcomastigophora; 
[ 1 ] ( sequence from N. A. ) 

MEADE J. C. , SHAW J. , LEMASTER S. . GALLAGHER 
MOL. CELL. BIOL. 7 5 3937-3946(1987). 

-!- CATALYTIC ACTIVITY a ATP + H(2)0 = ADP + ORTHOPHOSPHATE. 
-!- SIMILARITY^ BELONGS TO THE CATION TRANSPORT ATPASES FAMILY 
(E1-E2 ATPASES). 

-!- SIMILARITY: THE TWO L. DONOVAN I CAT I ON— TRANSPORT I NG ATPASE GENES 

ARE 98% HOMOLOGOUS. 
-!- CAUTIONs IN POSITION 351 THE N. A. SEQUENCE PREDICTS ARG » THE 

PROTEIN TRANSLATION SHOWN IN THREE PLACES IN THE PAPER GIVES 

LYS » WHICH IS CONSERVED IN ALL KNOWN E1-E2 ATPASES. WE HAVE 

USED LYS AFTER CONFIRMATION FROM THE AUTHORS. 
EMBLJ Ml 7889; LDCATP1. 
PROSITE? PS00154 5 ATPASE_E1_E2„ 

HYDROLASE; ATP HYDROLYSIS; TRANSMEMBRANE; PHOSPHORYLATION f 



KW 


MAGNESIUM; 


ATP— B I ND I NG. 




FT 


TRANSMEM 


93 


1 12 


PUTATIVE. 


FT 


TRANSMEM 


1 18 


137 


PUTATIVE. 


FT 


TRANSMEM 


265 


286 


PUTATIVE. 


FT 


TRANSMEM 


295 


321 


PUTATIVE. 


FT 


TRANSMEM 


631 


651 


PUTATIVE. 


FT 


TRANSMEM 


662 


684 


PUTATIVE. 


FT 


TRANSMEM 


698 


712 


PUTATIVE. 


FT 


TRANSMEM 


738 


761 


PUTATIVE. 


FT 


TRANSMEM 


813 


840 


PUTATIVE. 


FT 


TRANSMEM 


869 


887 


PUTATIVE. 


FT 


MOD_RES 


351 


351 


PHOSPHORYL AT I ON. 


SO 


SEQUENCE 


974 aa; 


1 07448 


MW; 5115862 cn; 



Initial Score 
Residue Identity 
Gaps 



18 Optimized Score = 68 Significance = 3.95 

24% Matches - 93 Mismatches = 210 

82 Conservative Substitutions = o 



X lO 20 30 40 SO 

MQTRR VVLKSAAAGTLLGGLAGCATWLDRSAOA I GS I RARP I T I SEAGFTLTHE— 



FLDPPRPDTKDT I RRSKEYGVDVKM I TGDHLL I AKEMC-RMLDLDPN I LTADKLPQ I KDANDLPEDLGEK YG 
500 510 520 530 540 550 560 

60 70 80 90 100 110 120 

D I CGSSAGFLRAWPEFFGSRKALAEK.AVRGLRAR AAGVRT I VDVSTFD I GRD VSLL AEVSRA AD VH I V A 



DMMLSVGGFAQVFPE- 
570 580 



-HKFM I VETLR9RGYTCAMTGDGVNDAPALKRADV — G I AVHGATDAARA A 



590 



600 



610 



620 



630 



130 140 150 1 60 170 
ATGL WFDPPLSMRLR YVE ELTOFFLRE IQYGI EDT G I RAG 1 1 K V ATTGK A TPFQ E 



ADMVLTEPGLS VVVEAMLVSREVFQRMLSFLTYR I SATLQLVCFFF I ACFSLTPK A YGSVDPHFQFFHL 

640 G50 SSO 670 680 690 700 



180 190 200 210 220 230 240 

L VLK AA ARASL ATG VP VT THTA ASQRDGERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLT ALLRGYL I G 

it lit li i i it i t t i i 

it lit it t i it i iti i 

P VLMFML I TLLNDGCLMT I GYDHV I PS ERPQKWNL-PWFVS AS I L A AV ACGSSLM 

710 720 730 740 750 



250 260 270 2SO 290 300 310 

LDH I PHS A I GLEDN ASASPLLG I RSW9TR ALL I K AL I DGGYMKQ I L VSNDWLFGFSS Y VTN I MDVMDR 

i i l t t < tii i i ii< lit iti 

i iiii i tii i i iti iii iti 

L LW I GLE GYSSGY YEN8WFHRLGLAQLPQGKLVTMMYLK- I S I S-DFLTLFSSRTGGHFFFYMP 

760 770 7S0 790 800 810 



320 330 340 X 

VNP — DGMAF I PLRV I PFYERRAS — HRKRCQASL 

t tilt ttii i 

■ iiii, ii i i i 

PSP I LFCGAI I SLLV STMAASFWHKSRPDNVLTEGLAWGQTN 

820 830 840 850 860 



8. L0W344-FIG1. PEP 

POLGSWNV GENOME POLYPROTEIN (CAPS ID PROTEIN C? ENVELOPE GLY 



ID 
AC 
DT 
DT 
DT 
DE 
DE 
DE 
OS 
OC 
RN 
RA 
RL 
RN 
RA 
RL 
RN 
RA 
RL 
CC 
CC 
CC 
CC 
DR 
DR 
KW 
KW 



(REL. 
(REL. 
(REL. 



PRTJ 3430 AA. 



06, CREATED) 

06, LAST SEQUENCE UPDATE) 
10, LAST ANNOTATION UPDATE) 
(CAPS ID PROTEIN C? ENVELOPE 
NONSTRUCTURAL PROTEINS NS1 , 



GLYCOPROTEIN 
NS2A, NS2B, 



Mr MAJOR 
NS3, NS4A, 



POLGSWNV STANDARD i 

P06935 ? 
01-JAN-198S 
01-JAN-1988 
0 1 —MAR— 1 989 
GENOME POLYPROTEIN 
ENVELOPE PROTEIN E5 
NS4B AND NS5). 
WEST NILE VIRUS, 

V I R I DAE J SS-RNA ENVELOPED VIRUSES; FLA V I V I R I DAE, 
[ U < SEQUENCE FROM N. A, ) 

CASTLE E. , LEIDNER U. , NOWAK T. , WENGLER G, 
VIROLOGY 149s 10-26( 1986). 
[2] (SEQUENCE OF 1-291 FROM N. A. ) 
CASTLE E. , NOWAK T. , LEIDNER U. 
V I ROLOGY 1 45 • 227-236 < 1 985 ) , 
[3] (SEQUENCE OF 255-854 FROM N. A. ) 

WENGLER G. , CASTLE E. , LEIDNER U. , NOWAK T. , WENGLER G. ; 
V I ROLOGY 1 47 • 264-274 < 1 985 > . 

-!- SUBUNIT" THE VIRION OF THIS VIRUS IS A NUCLEOCAPSID COVERED BY A 
LIPOPROTEIN ENVELOPE. THE ENVELOPE CONSISTS OF TWO PROTEINS' 
PROTEIN M AND GLYCOPROTEIN E. THE NUCLEOCAPSID IS A COMPLEX OF 
PROTEIN C AND MRNA, 

PIR; A25256! GNWVWV. 

embl; mi oi 03; flwnvsp. 

POLYPROTEIN; COAT PROTEIN? ENVELOPE PROTEIN? NONSTRUCTURAL PROTEIN? 
TRANSMEMBRANE ; GLYCOPROTE I N, 



WENGLER G. 



WENGLER G. , WENGLER G. 



FT 


CHAIN 


1 


105 


CAPS ID PROTEIN C. 




FT 


PROPEP 


106 


215 






FT 


CHAIN 


216 


2SO 


ENVELOPE GLYCOPROTEIN 


M. 


FT 


CHAIN 


291 


787 


MAJOR ENVELOPE PROTEIN E. 


FT 


CHAIN 


788 


1 187 


NONSTRUCTURAL PROTEIN 


NS1. 


FT 


CHAIN 


1 188 


1354 


NONSTRUCTURAL PROTEIN 


NS2A. 


FT 


CHAIN 


1355 


1484 


NONSTRUCTURAL PROTEIN 


NS2B. 


FT 


CHAIN 


1485 


.4109 


NONSTRUCTURAL PROTEIN 


NS3. 


FT 


CHAIN 


21 lO 


2394 


NONSTRUCTURAL PROTEIN 


NS4A. 


FT 


CHAIN 


2395 


2579 


NONSTRUCTURAL PROTEIN 


NS4B. 


FT 


CHAIN 


2580 


3430 


NONSTRUCTURAL PROTEIN 


NS5. 


FT 


CARBOHYD 


138 


138 


POTENTIAL. 




FT 


CARBOHYD 


917 


917 


POTENTIAL. 





FT 


CARBOHYD 


s&>: 


S62 


POTENTIAL. 


FT 


CARBOHYD 


994 


S34 


POTENTIAL. 


FT 


CARBOHYD 


1283 


1 289 


FOTENTIAL, 


FT 


CARBOHYD 


1659 


1659 


POTENTIAL. 


FT 


CARBOHYD 


2336 


233B 


POTENTIAL, 


FT 


CARBOHYD 


2483 


2489 


POTENTIAL, 


FT 


CARBOHYD 


2573 


2573 


POTENT I AL. 


FT 


CARBOHYD 


2739 


2739 


POTENTIAL. 


FT 


CARBOHYD 


2759 


2759 


POTENTIAL, 


FT 


CARBOHYD 


2864 


2864 


POTENTIAL, 


FT 


CARBOHYD 


2902 


2902 


POTENTIAL. 


SQ 


SEQUENCE 


3430 


AA ? 37S640 


MW? 2. 098' 



Initial Score 
Residue Identity 
Gaps 



10 Optimized Score » 63 Significance « 3.95 

21% Matches = 87 Mismatches = 231 

78 Conservat i ve Subs t i tut i ons = O 



X 10 20 30 40 50 

MQTRR WLKS A A AGTLLGGL AGC AT WLDRSAQ A I GS I RARP I T I SEAGFTLTHE- 

i i i i i i i i • it i it i 

• #1 ii i t i iii i it i 

VESHGK I GATOAGRFS I TPSAPSYTLKLGE YGE VTVDCEPRSG I DTSAYYVMS VGEKSFLVHREW 

440 450 460 470 480 430 500 



60 70 80 90 100 
D I CGSSAG FLRAWPE FFGSRKAL — AEK A VRGLRAR AAGV RT I VDVSTFD I GR 

tiii i i i i i i i tit t i i 

i i f i i i i ii ii iii i i i 

FMDLNLPWSSAGSTTWRNRETLMEFEEPHATKQSVVALGSQEGALHOALAGAIPVEFSSNTVKLTSGHLKCR 
510 520 530 540 550 560 570 



110 120 130 140 150 160 170 

DVSLLAEVSRAADVH I VAATGLWFDFPLSMRLR WEELTQFFLRE I OYG I EDTG I RAG 1 1 K VATTGK ATPFQ 

i i iii iii lit ii 

i i til iii iii ii 

VKMEKLQLKGTTYGVCSKAFKFARTPADTGHGTWLEL QYTGTDGPCK VP I SSVASLNDLTPVG 

580 590 GOO 610 620 630 



180 190 200 210 220 230 
ELVLKAAARASLATGVP— VTTHTAASGROGERGRPPFLS PKLEPSRVC I GHSDDT 

it i t i t t i tti itt i 

ii i ii tit itt iii i 

RLV T VNPF VSVATANSKVL I ELEPPF6DSY I WGRGEQQ I NHHWHKSGSS I GK AFTTTLRGA 

640 650 660 670 680 690 700 



240 250 260 270 280 290 300 

DDLS YL — TALLRGYL I GLDH I PHS A I GLEDN ASASPLLG I RSWOTRALL I K AL I DQGYMKQ I LVSNDWLFG 

l i i i i t t i it iii ii ii i i 

tilt t i ii *i iii ii it i t 

QRLAALGDTAWDFGSVGG VFTSVGK A I HQ VFGGAFRSLFGGMSW I TQGLL— GALLLWMG I NARDRS I AMTFL 
710 720 730 740 750 760 770 

310 320 330 340 X 

FSSYVTN I MDVMDR VNPD-QMAF I PLRV I PFYERRASHRKRCBASL 

■ t titt i t 

i i i t i i i i 

AVGG VLLFLS V — N VHADTGC A I D I ER3ELRCGSGVF I h'ND VEAWMDR YKF YPETP 
780 790 BOO 810 X 820 



9. L0W344-FIG1. PEP 

ATXBSLE I DO PROBABLE E1-E2 TYPE CATION ATPASE IB (EC 3.6. 1.-). 



ID ATXB$LEIDO STANDARD > PRT? 974 A A. 

AC PI 2522? 

DT 01 -OCT- 1983 < REL. 12* CREATED) 

DT 01 -OCT- 1983 ( REL. 12, LAST SEQUENCE UPDATE) 

DT 01 -OCT- 1989 (REL. 12, LAST ANNOTATION UPDATE) 

DE PROBABLE E1-E2 TYPE CATION ATPASE IB (EC 3.6. 1.-). 

OS LEISHMANI A DONOVAN I. 

OC EUKARYOTA 5 PROTOZOA i SARCOMAST I GOPHORA 5 M AST I GOPHORA. 

RN C 1 ] ( SEGUENCE FROM N., A ) 



RA 
RL 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
KW 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SO 



MEADE J. C. 



HUDSON K. M 



, STRINGER S. L. t STRINGER J. R. 5 

33=81-92< 1989). 
ATP + H<2)0 - ADP + ORTHOPHOSPHATE. 
TO THE CATION TRANSPORT ATPASES FAMILY 



MOL. BIOCHEM. PARAS I TO' _„ 
-!- CATALYTIC ACTIVITY ' 
-!- SIMILARITY^ BELONGS 
(E1-E2 ATPASES ) . 

-!- SIMILARITY « THE TWO L. DONOVAN I CATION-TRANSPORTING ATPASE GENES 

ARE 98% HOMOLOGOUS., 
EMBL5 JO4004! LDCATP2. 
PROS I TE 5 PS00154J ATPASE 



HYDROLASE ? 

MAGNESIUMS 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

TRANSMEM 

MOD_RES 

SEQUENCE 



_Ei_E2,. 

ATP HYDROLYSIS? TRANSMEMBRANE > PHOSPHORYLATION J 



ATP-BINDING. 




33 


1 12 


PUTATIVE. 


1 18 


137 


PUTATIVE. 


2615 


286 


PUTATIVE. 


235 


321 


PUTATIVE. 


G31 


651 


PUTATIVE. 


662 


684 


PUTATIVE. 


6S8 


712 


PUTATIVE. 


738 


761 


PUTATIVE. 


813 


840 


PUTATIVE. 


869 


887 


PUTATIVE. 


351 


351 


PHOSPHOR YLAT I ON. 


974 aa; 


107304 


MWS 5132373 CN» 



Initial Score = 18 Optimized Score = 68 Significance = ' 3. 35 

Residue Identity = 24% Matches « 93 Mismatches « 21 O 

Gaps « 82 Conservative Substitutions - O 

X lO 20 30 40 SO 

MOTRR WLKS A A AGTLLGGLAGC ATWLDRSAG A I GS I RARP I T I SE AGFTLTHE 

it it it tit i < 

if ii it iii t i 

FLDPPRPDTKDT I RRSKE YGVD VKM I TGDHLL I AK EMC— RMLDLDPN I LTADKLPQ I KDANDLPEDLGEK YG 

500 5 1 0 520 530 540 550 560 



SO 70 80 SO 100 110 120 

D I CGSS AGFLR AWPEFFGSRK AL AEK A VRGLRAR A AGVRT I VDVSTFD I GRD VSLL AE VSRA AD VH I V A 

i t ii ii tiit i i ii iii t 

i t ii ii iiti i i ii iii i 

DMMLSVGGF AG VFPE HKFM I VETLRQRGYTC AMTGDGVND AP ALKRADV — G I AVHGATD A ARA A 

570 580 530 600 610 S20 630 



130 140 150 160 170 
ATGLWFDPPLSMRLR YVE ELT9FFLRE I QYG I EDT G I RAG 1 1 K VATTGK A TPFQ E 

i iii ii iii lit i tii ii 

t itt it iii tit i iti ii 

ADMVLTEPGLS V WE AML VSRE VFQRMLSFLT YR I SATLQL VCFFF I ACFSLTPK A YGS VDPNFQFFHL 

640 650 660 670 680 690 700 



180 190 200 210 220 230 240 

L VLK A AARASLATGVPVT THTAASQRDGERGRPPFLSPKLEPSRVC I GHSDDTDDLSYLTALLRGYL I G 

ll lit ii ii li i iiit 

ii iii ii it tt i till 

PVLMFML I TLLNDGCLMT I GYDHV I PS ERPQKWNL— PVVFVS AS I L AAV ACGSSLM 

710 720 730 740 750 



250 260 270 280 290 300 310 

LDH I PHSA I GLEDNASASPLl J3 1 RSW9TR ALL IKALI DQGYMKS I L VSND WLFGFSS — YVTN I MDVM 

i i i i t i tit i i iii iii iii I 

i iiit i i t t t ■ i i t i i t t t i i 

L LW I GLE GYSS9Y YENSWFHRLGL AQLPQGKLVTMMYLK- I S I S-DFLTLFSSRTGGHFFFYVP 

760 770 780 790 800 810 



320 330 340 X 

DRVNPDGMAF I PLRV I PFYERPAS — HPKRCGASL 

iiit ii i t t 

i i t i i i i i i 

PSP ILFCGAI I SLLV STMAASF WHK5RPDNVLTEGLAWGQTN 

820 830 840 R50 S60 



1 0. L0W344-F I G 1 . PEP 



AirUWJENBI Pi IP SYNfHRSE ALPHA CHAIR, MITOCHONDRIAL (EC ^. 6. 1. 



ID ATPOSOENBI STANDARD? PRT? 511 A A. 

AC P05492 ; 

DT 01 -NOV- 1988 (REL. 09, CREATED) 

DT 01 -NOV- 1988 (REL. 09, LAST SEQUENCE UPDATE) 

DT 01 -NOV- 1988 (REL. 09, LAST ANNOTATION UPDATE) 

DE ATP SYNTHASE ALPHA CHAIN, MITOCHONDRIAL < EC 3. S. 1.34). 

OS OENOTHERA BIENNIS. 

OG MITOCHONDRION. 

oc eukaryota? planta; spermatophyta ; angiospermae. 

RN C 1 ] ( SEQUENCE FROM N. A„ ) 

RA SCHUSTER W. , BRENNICKE A. 5 

RL MOL. GEN. GENET. 204: 23-35 < 1986). 

CC -!- FUNCTIONS THIS IS ONE OF THE 5 CHAINS OF THE ENZYMATIC COMPONENT 
CC (COUPLING FACTOR CF( 1 ) > OF THE MITOCHONDRIAL ATPASE COMPLEX. 

DR EMBL5 X04023? MIOBATPA; 

DR PROSITE? PS00152J ATPASE_ALPHA_BETA. 

KW ATP SYNTHESIS? CF( 1 ) COUPLING FACTOR? HYDROGEN ION TRANSPORT? 

KW HYDROLASE; ATP— BINDING ? MITOCHONDRION. 

FT NP_BIND 171 178 ATP (BY SIMILARITY). 

FT ACT_SITE 373 373 BY HOMOLOGY. 

SQ SEQUENCE 511 AA? 55596 MW? 1250759 CN? 



Initial Score = 9 Optimized Score = 68 Significance = 3.95 

Residue Identity = 23% Matches = 91 Mismatches = 223 

Gaps = 67 Conservative Substitutions = O 

X 10 20 30 40 50 60 

M9TRR V VLK S A A AGTLLGGL AGC A rWLDRSAQ — A I G-S I R ARP I T I SEAGFTLTHED I CGSSAGFL 

lit i I i ) t i it l ii i 

i t i i t i it i ii f t i i 

MEFSPRA AELTTLLESR I TNF YTNFQVDE I GRV I SVGDG I ARVYGLNE I 0 AGEM VEF ASGVKG I AL 
X 10 20 30 40 50 60 



70 80 SO 1 OO 110 1 20 
RAWPE FFGSRK ALAEK AVRRLRARAAGVRT I VDVSTFD I GRDVSLLAEVSRAADVH I — VAATG 

i til i i (i iiii tit i t ii ii 

i iii i t ii iiii i i i i i i i it 

NLENENVG I VVFGSDTA IKE GDLVKR TGS I VD V PAGKSLLGR VVDALGVP I DGRGALGDHE 

70 80 90 100 110 120 



130 140 ISO 160 170 ISO 190 
LWFDPPLSMRLRY VEELTQFFL-RE I QYG I EDTG I RAG I I K V ATTGK ATPFOEL VLK AAAR ASL ATGVP 

i tilt till ill i ill 

i ii i '. iiit iii i iii 

RRRVEVKVPG 1 1 ERK S VHEPM9TGI _K A VDSL VP I GRGBREL 1 1 GDRQTGK TA I A I DT I LNBKQMNSR ATSES 
130 140 ISO 160 170 180 190 



200 21 0 220 230 240 250 
VT THTAASQ— RDGERGRPPFLS — PKLEPSRVC I GHSDDTDDLSYLTALLRGYL I G LDH I PHSA I - 

i iii t i tit i i i i i i tit 

I 111 II 111 4 1*1 t t III 

ETLYCVYVA I GQKRST V AQLVO I LSEGNALE YS I L VAAT ASDPAPLQFL- APYSGC AMGE YFRDNGMH AL I I 
200 210 220 230 240 250 260 270 

260 270 280 290 300 310 

— GLEDNASA SPLLG I RSWQTRALL I KALI DQGYMKOI L VSNDWLFGFSSYVTN I MDVMDR VNPD 

itt tit t iii ii i i 

iii iii t iii ii i i 

YDDLSKOAVAYRSMSLLLRRPPGRGA FPGDVFYLHSRLLERAAKRSDQTGAGS — LTALPV I ETQAGD 

280 290 300 310 320 330 



320 330 340 X 

GMAF I PLRV I PF YERR ASHRK RCQ ASL 

i ■ i t i ii 

i ii ii it- 

VS AY I PTNV I S I TDGO X CLETELF YRG I R 

340 350 X 360 



r 



I. 



Results file 1qw344— figl. res made by maryh on Wed 17 Apr 91 1 2 •• 4 1 s 07-PDT. 

Query sequence being compared* L0W344-FIG1. SEQ 

Number o+ sequences searched: 43406 
Number of scores above cutoff* 156 

Results of the initial comparison of LOW344-F I G 1 . SEQ with-- 
Data bank = GenBank 65. O. all entries 
Data bank = UEMBL 24_65, all entries 

100000- 

N 

U50000- 
M - 
B 
E 

R -** 
□ 

F 10000- 

S - * 

E 5000- 

9 

U 

E 

N 

C • * * 
E 

S 1000- 



500- 



ioo- 

t - * 



0- 



io- 



SCORE 
STDEV 



O! 



!99! 



198 



236 



395 



494 



593 



790 



889 



8 



Similarity matrix 
Mismatch penalty 
Gap penalty 
Gap size penalty 
Cutoff score 
Randomization group 



Unitary 
1 

0* 



FftRftMETERS 
K 



Initial scores to save 
Optimized scores to save 



00 
33 
73 
0 



20 
20 



.> tuple 
Joining penalty 
Window size 



Al ignments to save 
Display context 



4 
30 
32 



10 
10 



SEARCH STATISTICS 



Scores j 



Mean 
3* 



Median Standard Deviation 

31 12.75 



T i mes = 



CPU 
OO :54 b 30. 02 



Total Elapsed 
02 s 41 sOO. OO 



Number of res i dues s 54775335 
Number of sequences searched- 43406 
Number of scores above cutoffs 156 



The scores below are sorted by initial score. 
Significance is calculated br:ssd on initial score. 

A 100% identical sequence to the query sequence was not found. 



The list of best scores iss 



Sequence Name 



1. 


PSEPTE 


2. 


M22863 


3. 


FVBOPD 


4. 


XI 5898 


5. 


XI 4805 


6. 


CELPOLII 


7. 


HSHEPSH 


8. 


HUMHPSNA 


9. 


BLYAMY2 



Description 



Length 



Init. 

Score 



Opt. 
Score 



Sig- Frame 



G7 standard deviations above mean 

PI asm id pCMSl ( from P. diminuta 1322 889 1304 67.29 0 

w##* 61 KtancMrd deviations above mean 

Figure l. Nucleotide sequence 1326 809 1309 61.02 0 

44 standard deviations above mean 

Flavobacteriuni sp. parathion h 1693 599 1281 44. 55 0 

8 standard deviations above mean 

Eimeria tenell^ mRNft for sporo 957 136 360 8.23 0 

b standard deviations above mean **** 

Mouse mRNPi encoding DNA (cytos 4973 112 569 6.35 0 

C- elegr^ns PNA polymerase II la 12993 111 526 6.27 0 

Human hepatOYii:;. mRNA for serine 2363 108 562 6. 04 0 

Human hepsni Yni'N^ complete cd 1783 108 456 6.04 0 

5 utaYidord deviations above mean 

Barley (K win^ra^ a 1 pha-amy I a 1588 107 590 5.96 0 



10. 


BLYAMYAA 


Barley a lps= - c «niy lase type A is 


1588 


107 


592 


5. 


96 


0 


1 1. 


BOVGABARB 


Bovine mRNA -for gamma-am inobut 


3010 


106 


574 


5. 


88 


O 


12. 


BOVIGCAB 


Bovine Ig c^erml me gairnYna-2-c'na 


1979 


104 


479 


5. 


73 


0 


13. 


BTIGG2HC 


Bov i ne I g yerm 1 i ne heavy cha i n 


1979 


104 


479 


5. 


73 


0 


14. 


RRATP2 


Rhodosp i r i 3 1 um rubrum gene clu 


4240 


104 


194 


5. 


73 


O 


15. 


PDEMDH 


P. den i t r i f i cay is met hano 1 dehyd 


2314 


103 


425 


5. 


65 


0 


16. 


MUSGT2A 


M. musculus glucose transporter 


2521 


102 


232 


5. 


57 


0 


17. 


SMASFUABC 


S. marcescens per i plasm ic-bindi 


4583 


lOl 


583 


5. 


49 


O 


18. 


RABIGHAB 


Rabbit Ig iftu c'iBin secreted -fo 


1953 


101 


489 


5. 


49 


0 


19. 


HUMIGHBD 


Human Ig unproduct i vely rearra 


1 127 


lOO 


384 


5. 


41 


0 


20. 


MUSRGEB3 


Mouse 1 SS » 5. 8S 9 2SS rRNA gene 


3061 


98 


360 


5. 


25 


O 



The scores below are sorted by optimized score. 
Significance is calculated based on optimized score. 

A 100% identical sequence to the query sequence was not -found. 



The list of best scores iss 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 



271 standard deviations above mean 



1. 


M22S63 


Figure 1. Nucleotide sequence 1326 


809 


1309 


271. 


26 


0 






263 standard deviations above mean 








2. 


PSEPTE 


Plasmid pCMSl (from P. diminuta 1322 


889 


1304 


269. 


45 


0 






261 standard deviations above mean 








3. 


FVBOPD 


Flavobacter ium sp. parathion h 1633 

15 ?3tandard deviations above mean 


533 


1281 


261. 


lO 


0 


4, 


TRN21TNPA 


Transposon Tn2i tnpA gene for 3176 

12 standard deviations above mean 


87 
#### 


606 


15. 


98 


0 


5. 


XI 7379 


Sorghum vuXgara mRNA for phosp 3147 
zxxx 11 standard deviations above mean 


80 


597 


12. 


71 


0 


6. 


NEUTRP1 


a crassa tri functional tryptop 2750 
xzzz 10 standard deviations above mean 


85 


594 


1 1. 


62 


O 


7. 


BLYAMYAA 


Barley alpha-amyla.se type A is 1588 


107 


592 


10. 


89 


0 


8. 


PDUMER 


Plasmid pDU 1 35S (from S. marces 2153 


95 


590 


lO. 


17 


O 


9. 


BLYAMY2 


Barley < H- vulgere) alpha-amyla 1588 


107 


590 


10. 


17 


0 


10. 


BPECYADE 


Bordetella pertussis cyaD gene 2040 

8 standard deviations above mean 


82 


590 


10. 


17 


0 


1 1. 


MUSNFMG 


Mouse NF-M gene for middle-mol 5471 

7 standard deviations above mean 


81 

**** 


586 


8. 


72 


0 


12. 


ACFTS140A 


Fuj inami sarcoma virus tempera 2715 


86 


584 


7. 


99 


0 


13. 


SVGSII 


Streptomyc3s vi r idochrotnogsnes 2755 


82 


584 


7. 


99 


0 


14. 


SMASFUABC 


S. marcescens per i p 3 asm i c-b i nd i 4583 


lOl 


583 


7. 


63 


0 


15. 


HS1 1UL 


Herpes simplex virus type 1 (H 108360 

6 standard deviations above mean 


94 


582 


7. 


26 


0 


16. 


ATUPRIREP 


A. tume-faciens plasmid pRiA4b r 4638 


80 


581 


6. 


90 


0 


17. 


HUMASPX 


Human nonerythroid alpha-spect 7787 


80 


581 


6. 


90 


O 


IS. 


SERCYSA 


S„ erythrae/? rhodanese-1 ike pro 3373 


81 


581 


6. 


90 


0 


13. 


X51950 


E. coli purHD cpBroii for A I CAR 3432 


94 


580 


6. 


54 


0 


20. 


MUSHCK 


Mouse hck c^ene for tyrosine ki I960 


81 


580 


6. 


54 


0 



1. L0W344-FIG1. SE9 



M22863 



Figure 1. Nucleotide sequence of F 1 avobacter i um op 



LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 



M22363 13 26 bp ds -DMA UNA 15-JUN-1989 

Figure l. Nucleotide sequence of F 1 avobacter i um opd gene fragment. 
M22863 



Unknown 

Unc lass i f isd.- 

1 < bases ). to .1.^26) 



AUTHORS 
TITLE 

JOURNAL 
STANDARD 

BASE COUNT 

ORIGIN 



Harper , L. L. * Mco* n 1 e 1 »C. b. » Miller, C. E. and W 1 1 d 1 J. K. 
Dissimilar Plasmds Isolated -From Pseudomonas diminuta MG and a 
Flavobacter ium ^p. (ATCC 27551) Contain Identical opd Genes 
App 1 . Env i roa M inobioL 54 , 2586-2589 < 1 988 ) 
unannotated sta-f -f _sntry 

279 a 363 c 392 g 286 t 



Initial Score = 809 Optimized Score « 1309 Significance » 271.26 

Residue Identity = 98% Matches « 1316 Mismatches = '& 

Gaps = 7 Conservative Substitutions = 0 

X 10 20 30 40 50 60 70 



CTGCAGCCTGACTCGGCACCAGTCGCTGCAAGCAGAGTCGTAAGCAATCGCAAGGGGGCAGCATGCAAACGA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 ( 1 1 t 1 1 t 1 1 1 i 1 1 1 1 t 1 1 t 1 1 t 1 1 1 1 t 1 1 » 1 i 1 t 1 I t 1 i t 1 1 1 1 1 1 1 1 1 1 1 1 1 t i 
1 1 1 1 1 r t 1 1 t 1 1 1 1 1 1 1 1 i * 1 1 1 1 1 i 1 1 1 t t t 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 • 1 t t t r 1 t t 1 1 1 1 1 1 1 1 1 1 1 t 1 t 1 t 

CTGCAGCCTGACTCGGCACCAGTCRCTGCAAGCAGAGTCGTAAGCAATCGCPiAGGGGGCAGCPiTGCAPlACGA 
X 10 20 30 40 50 60 70 

80 90 100 J\l/ 110 i 120 130 140 

GAAGGGTTGTGCTCAAGTCTGCGGIJCGCAGGAACTCTGCT6GGCGGCCTGGCTGGGTGCGCGACGTGGCTGG 

I I t I f I I I I I I I f I I < I • I ! I 1 I t < J I t I t I 1 I I I I I t ) I t ( I I t I t I < I I 1 t I I I I ) t I I t I I I I I I I I 

i t i i r i i t I i i t i i i i i f i i t i i t I I i i i i i i i ( i i i i ^ i i i i i i i i « i i i i i i I I i t. I i t t I t I i I I t i 

GAAGGGTTGTGCTCAAGTCTGCGGCCGCGAGAACTCTGCTCGGCGGCCTGGCTGGGTGCGCGACGTGGCTGG 
80 90 100 110 120 130 140 



150 IGO ii 170 180 190 200 210 

ATCGATCGGCACAGGCGATCGGATCAATACGTGCGCGTCCTATCACAATCTCTGAAGCGGGTTTCACACTGA 

t t i i i i i t i i i t t i i i i t l i i t t i t t i i i i i i i i t i i i t i i i i i t i i i i i i i i i i i t t i i > i t i i i i t i i 

t i i t i i i i i i i i i i t i ■ i i i ) » t i r i t i t t i i t i i i i i i i i i i i i i i i t i i i t i i i i i t i i i i i i i r i i r 

ATCGATCGGCACAGGCGATGCGATt^AATACGTGCGCGTCCTATCACAATCTCTGAAGCGGGTTTCACACTGA 
150 160 170 180 190 20O 210 



220 230 240 250 260 270 280 

CTCACGAGGACATCTGCGGCAGCTCGGCAGGATTCTTGCGTGCTTGGCCAGAGTTCTTCGGTAGCCGCAAAG 

t I I I I I I I I I I I t I I tititiiitrtttiittliitiiitiitltliilititttiltltiit ■ ■ l • l • l 
i i i i i i i i i i i t i i i t i t i i t t i i t i t i i i i i i ■ t i i i « i ) i t i t ( t i i i t i i i t r i i i i i i i i i i i i i i 

CTCACGAGGACATCT— CGGCAGCTr'GSCPiGGATTCTTGCGTGCTTGGCCAGAGTTCTTCGGTAG— CGCAAAG 
220 230 240 250 2GO 270 280 

290 310 320 330 340 350 360 

CTCTAGCGGAAAAGGCTGTGAGAGt?4ATTGCGCGCCAGAGCGGCTGGCGTGCGAACGATTGTCGATGTGTCGA 

i i I I i i i t I I i t i i i i i i i t i r t t t I i i i I I I I t i t t I I t t i i I I i t t I i I t i I I I I i i t t I I i i i i i i i i 
t t I I t i i i iifiitTitttiiititj'itiifitiiiiiiiitititiitiiitiiittfiiitiiiiiii 

CTCTAGCGCAAAAGGCTGTGAGAGtSATTGCGCGCCAGAGCGGCTGGCGTGCGAACGATTGTCGATGTGTCGA 
290 300 3 1 0 320 330 340 350 



370 380 3SO 400 410 420 430 

CTTTCGATATCGGTCGCGACGTCACviTTTATTGGCCGAGGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGG 

i i i i * i i t i i i t » i i t f i i i i i i » t i f i i i i i i i i i i i i i * i * t i » * i i i * i i i i » < i i i i » i i * i i t i i i 
i t i ■ i i t i t i t i i i i i t t i i i i i i i i i i t t t i i i i t i i i * i i t i i i i t i i i i i t i i i < i i i i i i i i t i i i i 

CTTTCGATATCGGTCGCGACGTCAWTTT ATTGGCCGAGGTTTCGCGGGCTGCCGACGTTCATATC— TGGCGG 

360 370 380 390 400 410 . 420 

440 450 460 470 480 490 500 

CGACCGGCTTGTGGTTCGACCCGCCACTTTCGATGCGATTGAGGTATGTAGAGGAACTCACAC— AGTTCTTC 

• i t i i i i t i i i i i i i t t i t l i i i t t i t i t t ) i t I I I i i i i i i i i r i i i i i i i t i t i i i t I i t t I i i i i i i i 
i « I I t i i t t I I I t i i i t x i i t t t t * ♦ t t t i i i t t I I 1 | i t i t I ( i » t I I I i i I I I I I I t I 1 I I I I i i t I I I 

CGACCGGCTTGTGGTTCGACCCGCCAC1 TTCGATGCGATTGAGGTATGTAGAGGAACTCACACTAGTTCTTC 
430 440 450 460 470 480. 490 500 



510 520 530 540 550 560 570 

CTGC-GTGAGATTCAATATG15CATUGAAG-ACACCGGAATTAGGGCGGGCATTATCAAGGTCGCGACCACAG 

till t i t i i i i t t < i i i r t i i i i : i i i i t i i i i t i i t i i • i t i t t i i i i t i i i i i i i i i i t i i i « i i i i i 
lilt i i i t i i t i i i i I I i t c i i i : i i i f t ■ i * i i i i i t f < t I < * i I ( t « < i t i f i < i I « > I i f t t i i i I t 

CTGCGGTGAGAT fCAATATGGCATCGAAGTACACCGGAATTAGGGCGGGCATTATCAAGGTCGCGACCACAG 
510 520 530 540 550 560 570 



580 590 GOO 610 620 630 640 

GCAAGGCGACCCCCTTTCAGGfiGTTAGTGTTAAAGGCGGCCGCCCGGGCCAGCTTGGCCACCGGTGTTCCGG 

i i i i I t i I I i I i i i t i i t I i i i i t i i i t i t t i i i i i i i t i i i t i i t t t i i i i i i i i i i i i t t i i i i i i i i i 
l l t I I l ■ I t l l l ) i l i t t t 1 l i t l i t i i r i | i i t i i | i t t t I t I l I 1 l 1 l t l l t t l I t l l t l l l | r t t t t 1 l 

GCAAGGCGACCCCCTT TCAGG AGT f AGTGT1AAAGGCGGCCGCCCGGGCCAGCTTGGCC ACCGGTGTTCCGG 
580 5?K> :?O0 6)0 620 630 640 



650 briG 67'..' BSC 690 700 710 

TAACCACTCACACGGCftGCAftGTCftGCGC&ftTRGTGftGCGAGGCAGGCCGCCATTTTTGAGTCCGAAGCTTG 

l i t l i i ■ i < i ( i • i f t i i i i i t i i i i r i : r i i i ( t t i i i t t i i i ( i i i i t l i t i l i i i l i i i i i i i i t i l i i 
i i i t » < i t i i i i i » t t i i t t t i i t t i i t i t » t * t i i t t i t i t i i i i i i t t i i i t i i » i i i t i i i i t i i i i t t 

TAACCACTCACACGGCP!GCAAGTC.'AGCGCGATGGTGftGCGPiGGCAGGCCGCCATTTTTGAGTCCGAAGCTTG 
650 660 670 6S0 6S0 700 710 



720 730 740 750 760 770 780 

AGCCCTCACGGGTTTGTATTGGTCftCAGCGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGCGCG 

i i i i i i t i i • i » i ) ] t t i i i i i i i t i t t : r i t i t t r t t i t » ■ i I i ■ * i • i * t i i • i ' i * i i i t t t i i i t ■ t i 
i i i i i i i i i t i • i t i i t i i t i t i i t i t i t ) t i i i * i t l i i ■ t i i t i i i i » i i i t i • i t t i i i i i t i i i t i i i 

AGCCCTCACGGGTTTGTATTGGTCr-.CA&CGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGCGCG 
720 730 740 750 7GO 770 780 

790 800 810 320 830 840 850 860 

GATACCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAGATAATGCGAGTGCATCACCGC 

i t i i • • I I i i I I t i i t i I t t I I I t i ( ( f i ( t 1 i I t t t i t t i t t I • • I I I i i i I t I I • i I I I t t i i i i i i i i i 
t i t t t t t i • I t t i i i i t I i t t t i i i ; < t t i i i i i i i i i i i t i i i t I t i I t I t i i I i i i ■ i i t < ( i t i i i i i t 

GPiTACCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAGATAATGCGAGTGCATCACCGC 
790 800 810 320 830 840 850 860 



870 880 890 900 910 920 930 

TCCTGGGCATCCGTTCGTGGCAAACACGGGCTCTCTTGATCAAGGCGCTCATCGACCAAGGCTACATGAAAC 

i i i i i i l i i t i i ■ t i i i t i ■ i i i i i t i i t t i i i t > i i t i i t f t i t i i i i i i i i i i i i i l t i l i f i t i i i i i l 
i t i i i i i i » i t t i i i i t i i i t i t t i * t i i t i i i t t i i i i i i i i i i t t t i i » i i t i t i t i t » i i i i i ( i i i i i 

TCCTGGGCATCCGTTCGTGGCPtAACPiCGGGCTCTCTTGATCAAGGCGCTCATCGACCAAGGCTACATGftAAC 
870' 380 890 300 910 920 930 

940 950 9SO 970 980 990 1000 

AAATCCTCGTTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCAACATCATGGACGTGATGGATC 
i t i i i i i i i i i i t i i t ■ t i i i i i i i t i i i i i i i t i t i < i i i i t t i * t i i i < i i * t i t * i i i i f i i * i t t i t i 

i i t i I i t i i i i i t i i i I I i i i i i i I r ( i l i t I i i t I I I t I i i I t I t I t i » I t I i i i i i i 1 i i i i I I t I t I i I 

AAATCCTCGTTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCAACATCATGGACGTGATGGATC 
940 950 960 970 980 990 lOOO 

1010 1020 1030 1040 1050 1060 1070 

GCGTGAACCCCGACG6GATGGCCTTCATTCCACTGAGAGTGATCCCATTCTACGAGAGAAGGGCGTCCCACA 

i i t i i i i i i i i i t i i i i » i i i t i t t t i i t i i t t i i t l i i i t i i i l i i i t i i i t i i i i t i l i i t i i i t i t i i 
i t I l i t • i i i i ■ ) ; ) t i i t ( t t i t t • i t i i i i i i i i i i i i i i i t I i i i t i i i ( t i t i i i I i 1 I t t i t t f I I 

GCGTGAACCCC6ACGGGATGGCCTTCATT— CACTGAGAGTGATCCCATTCTACGAGAGAAGGGCGTCCCACA 
1010 1020 103O H040 1050 ' iOGO 1070 

1080 1090 1 100 1 1 lO 1 120 1 130 1 140 

GGAAACGCTGCCAGGCATCACTGTGACTAACCCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATGACGCC 
■ i i i i i i i i i i i < i i i i t t i i i t i t t i t i i t i i i * i * i i t i « i i i * i i i i i i i i i i i i i i i i i i i i i i i i ■ 

i i I i I i i I i i t i t t r i i I t i ( i i I t : t i i i i t i i i i i t i i i i i > i i I i i i i i i i I t i i t I i i i i i i i i i t 

GGAAACGCTGGCAGGCATCACTGTGACTAACCCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATGACGCC 
1080 1090 UOO 1110 1120 1 130 1140 

1150 1160 1170 1180 1190 1200 1210 1220 

ATCTGGATCCTTCCACGCAGCGGCCACTATTCCCCGTCAAGATACCGAACGATGAAGTCGCGCATCGATCGA 
i i t i i i i ■ t i i i i i i * i i i i i i i i t i i i i t i i i i t i i i i i i i i i i i ■ i i * i i i i i i i i t t i i i i i t i i t i i i 

I t I I I t I I I I I I I I I I t t I t I t I I J I t t J I I 1 I 1 I I I I I I I I » t t I I 1 I I I I I I I I I I I 1 • I I > 1 ! I I I t I I 

ATCTGGATCCTTCCACGCAGCGGCCACTATTCCCCGTCAAGATACCGAACGATGAAGTCGCGCATCGATCGA 
1150 1160 1170 1180 1190 1200 1210 1220 



1230 1240 1250 1260 1270 1230 1290 

TAGGCATCTTCAATGTGATCAGGGCTGCCACCTCCAAAGCCGGTGGCCACCCCTGTCGATAGTCTTGAGGGA 

I I I I I t t I I I I I I I I I t I I 1 I I t t t I I I I t 1 t I 1 I t I ( t I 4 I I I I I I t I I I I I I I I I t I I t I t 1 ( I I t I I I t 
I I I I ( t I I I I I I I I t I I I t 1 I 1 1 t I t I f I I > t I 1 I I I t I I < ■ I I * • ■ t t ( < I I I I I ■ I < I < I I I I I I t I « f I 

TAGGCATCTTCAATGTGATCAGGGCTGCCACCTCCAAAGCCGGTGGCCACCCCTGTCGATAGTCTTGAGGGA 
1230 1240 1250 12GO 1270 1280 1290 



1300 1310 1320 X 

CGGTAGCGACGACCGTGCTTTTCGTGAACTGCAG 

i i i ■ i I I i ■ I t i t I t t i > i i i i i i i i t i i t ( i i I 

I I I I ! I t I 1 I I » I I t t t I I t t t t I ( t t t I I t I t I 

CGGTAGCGACGACCGTGCTTTTCGTGAACTGCAG 
1 300 1310 1. 320 X 



2. L0W344-FIG1, SE© 

PSEPTE PI asm id pCMSl < -f row P. diTninuta) phosphodiesterase 

LOCUS PSEPTE 1322 bp ds— DMA BCT 15-MAR-1989 

DEFINITION Plasmid pCMSl < from P. diTninuta) phosphodiesterase (opd) gene* 



ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
STANDARD 
FEATURES 
CDS 

BASE COUNT 
ORIGIN 



complete cds, 
M20392 

phosphotr i esterase., 

Plasm id pCMSl (-from Pseudomonas di mi nut a) DNA. 
PI asm id pCMSl 

Prokaryota? Bacteria? PI asm id pCMSl. 
1 (bases 1 to 1322) 

McDan i e 1 9 C a S 0 * Harper * L„ L, and Wild* J. R„ 

Cloning and sequencing o-f a pi asm id-borne gene (opd) encoding a 
phosphotr i esterase 

J- Bacter iol. 1 70 , 2306-23 11 (1 988 ) 
s i mp 1 e s ta-f *f _rev i ew 

Locat i on/Qua 1 i -f i ers 

S3. . 1040 

/note= " phosphotr i esterase prote in" 
278 a 367 c 392 g 285 t 
5 bp upstream -From PstI site- 



Initial Score = 889 Optimized Score « 1304 Significance 
Residue Identity « 98% Matches = 1313 Mismatches » 6 

Gaps = 11 Conservative Substitutions = 0 



X 10 20 30 40 50 GO 70 

CTGCAGCCTGACTCGGCACCAGTCGCTGCAAGCAGAGTCGTAAGCAATCGCAAGGGGGCAGCATGCAAACGA 

1 1 t I I 1 I I t I I I I I t 1 I I 1 I I 1 I I t I t I 1 I 1 t t 1 I I I 1 1 I T t I I t I I 1 1 1 t t t I I 1 I I 1 1 I I I I I t I t I 1 I I 
ttllllttllltlllllllllttltllllllllllltltllllltllllllltlllltlllltlltlllllt 

CTGCAGCCTGACTCGGCACCAGTCGCTGCAAGCAGPiGTCGTAPiGCPiATCGCAAGGGGGCAGCATGCAPiACGA 
X 10 20 30 40 50 60 70 



SO 90 lOO 110 120 130 140 

GAAGGGTTGTGCTCAAGTCTGCGGCCGCAGGAACTCTGCT(aGGCGGCCTGGCTGGGTGCGCGACGTGGCTGG 

i i i i i t i t i i i i i i t i i t i i i i i i i i i t i i i i i i i i i i I i i i i i • i i • i i i i t i i i i i i i i i i i i t t t • • 

i i i i i i i i t i i i i t i i ■ i • i i i i i i i t i t i i i i t t i i t 1 « i i i i f i • i t i i i i i i i i i t i i i t t i i i i i i 

GAAGGGTTGTGCTCAAGTCTGCGGCCGCGAGAACTCTGCTCGGCGGCCTGGCTGGGTGCGCGACGTGGCTGG 
80 90 100 110 120 130 140 

150 160 14, 170 180 190 200 210 

ATCGATCGGCACAGGCGATCGGATCAATACGTGCGCGTCCTATCACAATCTCTGAAGCGGGTTTCACACTGA 

i t i i i i I i i i i I i i i t t i i i t i i i i i i i I i i i i i i i i i i i t I i t i i t ■ i i i i i i i t i i i t t i i i i t i i i i 

i i i i i i i i i i i i i t t i i t i i i i i i t i t ■ i i i t i t i i i i i i i i i t i i i i t i t i i t i i i i i t i i i i t i t i t i 

ATCGATCGGCACAGGCGATGCGATCAATACGTGCGCGTCCTATCACAATCTCTGAAGCGGGTTTCACACTGA 
150 1BO 170 180 190 200 210 



220 230 240 250 2BO 270 280 

CTCACGAGGACATCTGCGGCAGCTCGGCAGGATTCTTGCGTGCTTGGCCAGAGTTCTTCGGTAGCCGCAAAG 

i i i I i t i r i t i i i i i i i t t t i i i i i i i i t i i t i t i i i i ■ • i i t t i i i i i i i i i i t i < i i i i i t i i i i i i i 
i t t i i i t i i i i i i i < i i i i i i i t i t i i i i t t i i i i i i i i i i i i t i i i ( i i i i i i i i < i i i i i i i i i i i i t 

CTCACGAGGACATCT— CGGCAGCTCGGCAGGATTCTTGCGTGCTTGGCCAGAGTTCTTCGGTAG— CGCAAAG 
220 230 240 250 260 270 280 



290 300 310 320 330 340 . 350 3GO 

CTCTAGCGGAAAAGGCTGTGAGAGGATTGCGCGCCAGAGCGGCTGGCGTGCGAACGATTGTCGATGTGTCGA 

t i i t i i i • i • t t i i i i t i t t i i i i i t t i i i i i i t i i t t t t i i i i i i > i t i ) i t t i l • t i t t i i i • i i i i i t i 
i i i i i i i i i > i i i i t i t i i i i i i i i t i i i i i t i i i i i t i i i i i i i i i i t i i i i t i i i > i i i i i i i i i i i i i i 

CTCTAGCGGAAAAGGCTGTGAGAGGATTGCGCGCCAGAGCGGCTGGCGTGCGAACGATTGTCGATGTGTCGA 
290 300 310 320 330 340 350 



370 380 390 400 410 420 430 

CTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGAGGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGG 

i i i t i i i t i i i i i i i i i t ■ i i i i t i i t i ■ i i t i i i i t t i i i i i i i i i i i t l i i i i ■ <( t i t i i i i iiitii 

t I I I I I I I I t I I I I I ! I I I I I I 1 I I I I 1 ■ I t I > I I 1 t I I I I t 1 I I I 1 I I t I I f I I I I I I I I I I I I I I I I I I 

CTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGAGGTTTCGCGGGCTGCCGACGTTCATATC-TGGCGG 
360 370 380 390 400 410 420 



440 450 460 470 480 490 500 

CGACCGGCTTGTGGTTCGACCCGCCACTTTCGATGCGATTGAGGTATGTAGAGGAACTCACAC— AGTTCTTC 

i I t I t t i i t i I i t i I I t i i i t r > I > i i i t i t i i i i i t r i i t t t t t t r I i i i I I t > I t t i i I t I t i i i i ■ t I 
I i I i t t i ( i i i I i i i i i I i t t t I I i t i t i t i i i t ■ i t I i t t i i 1 I i < i i t i i i I l i i i > I i t i i i i i t t I i 

CGACCGGCTTGTGGTTCGACCCGCCACTTTCGATGCGATTGAGGTATGTAGAGGAACTCACACTAGTTCTTC 
430 440 450 460 470 480 490 500 



510 520 530 540 550 560 570 

CTGC-GTGAGATTCAATATGGCATCGP,riG-ACACCGL r ;AA7TAGGGCGGGCATTATCAAGGTCGCGACCACAG 



CTGCGGTGAGATTCAATATCGCATCGAAGTACACCGGAAT^ 

510 520 530 540 550 560 570 



580 530 GOO 610 620 630 640 

GCAAGGCGACCCCCTTTCAG^AGiTTfiGTGTTPiAftGGCSGCCGCCCGGGCCAGCTTGGCCACCGGTGTTCCGG 

i i i i i i i i i t t t i t ■ t i i i ; t r t i t ^ l t : i ; t i t : i t i i i i i i i i i i i i i i i t i i i i i i i i t t i i i i i i i t i 
i i i i i i i i t i f t t i i i i i i i t i i < > r t i t : t ' t i i i t i t i t i i i i t i i t • t i i i • i i i i t t i i i t i i i i t i j 

GCAAGGCGACCCCCTTTCAGGAGTTAG1GTTAAAGGCGGCCGCCCGGGCCAGCTTGGCCACCGGTGTTCCGG 
580 590 GOO 610 620 630 640 

650 660 670 680 630 700 710 

TAACCACTCACACGGCAGCAAGTCAGCGCGATGGTGAGCGAGGCAGGCCGCCATTTTTGAGTCCGAAGCTTG 

i i i I t i I I i i t i i i i t I i i i t i t i I ( r i f i ■ i I t i i i i i i t i t i t t t i i i i i i i i i i i t i i i i i i i i i i i i i 
i i i i t i i i t t t < i i ■ t i t i t t i i t : • ; i t - i i i i t t i t i t i t i i i i i i i t i t i i i i t i t i i i t t i i i i i i i i 

TAACCACTCACACGGCAGCAAGTCAGCGCGATGGTGAGCGAGGCAGGCCGCCATTTTTGAGTCCGAAGCTTG 
650 660 67C 680 690 700 710 

720 730 740 750 760 770 780 

AGCCCTCACGGGTTTGTATTGGTCACAKCGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGCGCG 

i i i t t t i i i i t t t t t i i t i i i ■ t i i r t i t i t i i t i t i i t I i t i t i I i I t i i i i i i i t i t i t i t i i i i i i i i t 
l I I l 1 l t l t t l I ( I » t I l i I l t l I t i r l t : t ) l l l I 1 l i l t t l l • t l t l l t l i t l t i ( l l 1 l I l l I t t l t l l 

AGCCCTCACGGGTTTGTATTGGTCACAGCGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGCGCG 
720 730 740 750 760 770 780 

790 800 810 820 830 840 850 860 

GATACCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAGATAATGCGAGTGCATCACCGC 

t i i t i i i i i i ■ ■ ■ ■ i i * t i i i t i i i t i i i i t i i i i i i i i i i i i i i t i i t i i r i t i i i i t i i t i t i i i i i i i t 
tiiiitiiitiiitiiitiiittfttiii«ti«tititttiiittitiitiiiritiiftiiiitiiti<it 

GATACCTCATCGGTCTAGACCACATCCCGGftCAGTGCGATTGGTCTAGAAGATAATGCGAGTGCPiTCACCGC 
730 800 S 1 0 S20 330 840 350 860 

870 330 SBO , 300 91 0 320 330 

TCCTGGGCATCCGTTCGTGGCAAACACG9"r:*rCTC7TGATCAAQGCGCTCATCGACCAAGGCTACATGAAAC 

i i i i i i l i i i i i i i i i i i i : ■ i i i t i i : t i i : t i i % i ( i j i i i j i i i i i i i i i i i i i i i i i i i i i i i i i i i t 
i i i i i t i i i i i i i i i i i i t t t i l i i l t t i t i i i t t t i i i t i i i i i i i i i i i i i i t ( i i i i » i i i i i i i i i 

TCCTGGGCATCCGTTCGTGGCAAACACGiiGCTCTCTTGATCAAGGCGCTCATCGACCAAGGCTACATGAAAC 
870 380 S30 900 910 920 930 

940 950 960 970 980 990 lOOO 

AAATCCTCGTTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCAACATCATGGACGTGATGGATC 

■ I I I i I I I i I I I f i I I i I t i i i i i i : t t • 1 ■ I I i f t I 1 i t I f I 9 I r t i i t i i i i i i i i i I i i i t I t I i I i 

I I I t i t i i i I I i i t t i i t t < i i i t : i i t > ■ i t t i > i ■ i f I I I I I ■ t > f i i i i i i i t I ■ I i i I ( i t t i t i i i i 

AAATCCTCGTTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCAACATCATGGACGTGATGGATC 
940 950 9S0 970 980 990 1000 

1010 1020 1030 1040 1050 1060 1070 

GCGTGAACCCCGACGGGATGGCCTTCATI'CCACTGAGAGTGATCCCATTCTACGAGAGAAGGGCGTCCCACA 

i i i i i i i i ■ t i i i t i t i i i i i i i t i i i t i i t i i i i t i i i i i i i i i i ■ i t t i i i i i i i i i i ( i i i i i i t i i i 
i i i i i i i t i i i i t i i t i t t i i i > t i t t i 1 iiitiiittitiiitiiiiititiitiiiitiiiiiitiiii 

GCGTGAACCCCGACGGGATG6CCTTCATT-CACTGAGAGTGATCCCATTCTACGAGAGAAGGGCGTCCCACA 
1010 1020 1030 1040 1050 1060 1070 

1080 109O 1100 niO 1120 1130 1140 

GGAAACGCTGCCAGGCATC' V .CTGTGAC I of-,CCCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATGACGCC 

i i I I i i t i i I i i i i t i : i t i t i i : t t t i i t I t t i t i i ■ t i t t i a t r i i i i t i r t i ( t i t i t t i i i i t i I I t 
I I I i i i I t I I i i i i t I I t t i i t t t i < i ; : i I i i i I i i i i I i I i I i i I i t f I I i i i t I i I t t I t i i i i i t I < 

GGAAACGCTGGCAGGCATCACTGT(SACTAAi:;CCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATGACGCC 
1080 1090 1100 1U0 1120 1130 1140 

1150 1160 1170 USO USO 1200 1210 1220 

ATCTGGATCCTTCCACGCAGCGGCCACTATTCCCCGTCAAGATACCGAACGATGAAGTCGCGCATCGATCGA 

I i I I I I I i i I I I I i l I ; ! t i i i : t I t I > I t I t t I i t ! I i t I I t I I t I i I i i i » t I t t I i i I I i i i i i I 

t i i i i i i i i I » t i i » t i i » i i t i i i i t i t * i i t i i i i i i t r t t i t > t t » i i ( » ■ • i i i i i I t i tiiit 

ATCTGGATCCTTCCACe?CPA~CGGCCACT A ( If'.l "CCG TC A AG AT ACCGA ACG ATG A AGTCGCGC ATCGA 

1150 1160 1170 Hb'O 1130 1200 1210 

1230 1240 IZSO 1260 1270 1280 1290 

TAGGCATCTTCAATGTI ATCAGGG^ T^O :A' X^CCAAAGCCGGTGGCCACCCCTGTCGATAGTCTTGAGGGA 

i i i i i i i i t i i i i : i t t ♦ • t i f ■ ( t t c i i t i t i i i i ■ i i i i i t i i i i i t i i i i i i i i i t t i i i 

ililiti)tliliiiilft:titilft:: :ttii iit<iililiiii<itititliiiillllii<«lil 

TAGGCATCTTCAATGTf5ATCftGGSr;ri3r:;;Ai FCCA A AGCCG3 TGGCCACCCCTGTCGAT AGTCTTGAGGGA 
1220 1230 1240 i.?50 1260 1270 1280 



1300 1310 i.^'O y. 

CGGTAGCGACGAOC.'CV i ivC'riTTCU t t'f-vi;. | AG 



cggtagcgacgacclTtgcttttcg: r&/-; 

1290 1300 1310 iifiO X 



3. LO W344-F I G 1 . SEQ 

FVBOPD Fl avobacter i urn so* parathton hydrolase gene, ccmipl 

LOCUS FVBOPD IZ^Z bp ds-DNft BCT 15-JUN-1990 

DEFINITION F 1 avobacter A uyii op. parathion hydrolase gene? complete cds. 
ACCESS I ON M29593 
KEYWORDS parathion hydrolase* 

SOURCE Flavobacter iutn sp a (strain ATCC 27551) DNA ? clone pPDL2„ 

ORGANISM F 1 avobacter i urn sp. 

Prokaryota ? Bac; car i a ? Glrac i I i cutes s Scotobacter ia; Ne i sser i aceae 
F 1 avobacter i urn ? sp, 
REFERENCE 1 (bases 1 to J.B93) 

AUTHORS Mu 1 bry , W„ W. and Karns 9 J. S. 

TITLE Parathion hydrolase spec i -Pied by the Flavobacter i urn opd gene: 

Relationship between the gene and protein 
JOURNAL J„ Bacterid. 171, 6740-6746 (1989) 
STANDARD simple sta-f f _entry 
FEATURES Locat i on/C3ua 1 i -f i ers 

misc_signal 312. . 317 

/note«"-35 region" 
m i sc_s i gna 1 334. . 339 

/note« M - i 0 reg i on 11 
miscjbinding 408, B 411 

/note= !l r ibosme binding site" 
CDS 413, . 1516 

/note 3 * ;I pa rat I i on hydro i ase " 
BASE COUNT 372 a 4SY c 4*77 g 347 t 

ORIGIN 1 bp upstream o } EsTrtHI site, 

Initial Score = 5EK3 Optimised Score « 1281 Significance = 261. 10 
Residue Identity « SF% M;vcche^ « i304 Mismatches = 14 

Gaps « 28 Conservative Substitutions « 0 

X 10 20 30 40 50 60 

CTGCAGCCTGACTCGECACCftGTCGCTGCAAGCAGAGTCGTAAGCAATCGCAAGGGGGCAGC 



i i i j 



CGGTTCAGATCTGCAeCCTGACTCf^CP.CC^VTCGCTGCftfiGCAGAGTCGTAAGCAATCGCAAGGGGGCAGC 
350 3G0 370 380 390 400 410 

70 80 SO 100 110 120 130 

ATGCAAACGAGAAGGGTTGTGCTCAAGTCTSCG GCCGCAGGAACTCTGCTGGGCGGCCTGGCTGGGTGC 

i I I I I i • i t f i t i i i i t i i i i r i t i i t i t i i i i t i i i I I t I i i t i i t i i ( i i i i I i i i i t i i i 

i i i i i i t I i i I I t I I i » i t I i : i t i : : r i i i i t i t 1 i t i i t i t > i i t t i i t f i . t t i i t i i < i i i i i i t 

ATGCAAACGAGAAGGGTTGTGCTC.^.AGTCT GCC?:GCCGCCGCAGG;AACTCTGCTCGGCGGCCTGGCTGGGTGC 
420 430 440 450 460 470 480 490 

140 150 ISO 170 ISO 190 200 

GCGA-CGTGGCTGGATCGATCGGCnCAG..GCGATCGGiATCAATA-CGTGCGC-GTCCTATCACAATCTCTGAA 

till t i I i i i I i I i r i t t t t i t t t : t i t t t i i » i i i t i i i i i i i i t i i i i i i t i t t i i i i t i i i i t i i 
i i I i t i i i i i i i i i i i I t t I i t i t * i t t i t t t I I t i i i t i * I t i i i i i i i i i i t i t ) i i i » i i i i i i i i 

GCGAGCGTGGCTGGAl"CGAir.GG:C;PiC^r,:ttUf=;ATCGGATCAATACCGTGCGCGGTCCTATCACAATCTCTGAA 
500 510 520 530 540 550 560 

210 220 2.jO 240 250 260 270 

GCGGGTTTCACACTGACTCftCGAGGACATCTGCGGCAGCTCGGCAGGATTCTTGCGTGCTTGGCCAGAGTTC 



■ i i i i 



GCGGGTTTCACACTGACTCfU"GAGi.'£.~ HTl L ; :CGGCAGCTCGGCAGGATTCTTGCGTGCTTGGCCAGAGTTC 
570 580 530 GOO 610 620 630 

280 290 300 310 320 330 340 

TTCGGTAGCCGCAAAGCTCTAGC&MAPftAGaCTGTGAGAGGATTG CGCGCCAGAGCGGCTGGCGTGCGA 

« i i i i i t t t i i t i i i i i i i i i i t t i : » : » * i i t i » i i i i i i i i i i t r i i i t i i i i i i i t i i t i i t t i i i 

i i i i i t i i i i i t » i i i i i i ) i t i » » ( » i i i i i t i i i t t i t i i i t » t i i t i i i t i i i i i i i i i i i t i i i i 

TTCGGTAGCCGCAAAG-CTCTAUCi-'-^^ - '' t- ■■I rGTr;AGOCGATTGCGCCGCGCCAGAGCGGCTGGCGTGCGA 



640 650 !-•...». 670 680 690 700 

350 3S0 370 380 390 400 410 

ACGATTGTCGATGTGTCGACTTTCf^ATOrCGGTCGCGftCGTCAGTTTATTGGCCGAGGTTTCGCGGGCTGCC 

i i i t i i i i i f i t t i i t t i i ; t i t i i t t ■ i i i i « : i t i i i i i t t i t i i i i i i i i i i i i i t i i i i i i i i i < i i i 
i i i i i i i t i ( » i i i i i i i t f i t i i i i * i t t « » i : i ; i i i i i i t t t r t i i i i i i i i i i i i i t i i i i i i i f i 

ACGATTGTCGATGTGT(;GACTTTC:!-;f-VT,;TCL-G:TCGCGACG;TCAGiTTlATTGGCCGAGGTTTCGCGGGCTGCC 
710 720 730 740 750 760 770 

420 430 440 450 460 470 480 

GACGTTCATATCGTGGCGGCGACCXtSC 'TlST'l-f-TTCGACCCGCCACTTTCGATGCGATTGAGGTATGTAGAG 

I i i i i t i i i i i i i i t i i i i i i : i i i i i i t i i » t t t t t i t i t i I i i i t i i t i t i i t i t i t i i i i i i i i 

i i t i i i i i t t i i i i ■ i t i i i i i i i i t t * t ' i i i t i i i i i i i t t » i t i i i i i i t i i i i t i i i i t i i i t i i i 

GACGTTCATATCGTGGCGGCGACCGGC1TGTGGTTCGACCCGCCACTTTCGATGCGATTGAGGAGTGTAGAG 
780 790 800 S10 820 830 840 850 

490 500 510 520 530 540 550 

GAACTCACACAGTTCTTCCTGCGTbiftaATTCAATATGGCATCGAAGACACCGGAATTAGGGCGGGCATTATC 

I i i t I i t I I f t I I t I I I I I I I t i : ! 1 t I i i i « t t t I i i i « t t i t t i t i • I I i t I I I I t i i i i i i t • t I I I ( I 
i t t i i t i i i i i i i i i i i i t t i i i t i t t t i i i t i t i i i i t t i i i i i i i t t t i i i t i i i t i i ■ t i i i i i i i i i i 

GAACTCACACAGTTCTTCCTGCGTGAGAT'i CAATATGGCATCGAAGACACCGGAATTAGGGCGGGCATTATC 
860 870 3S0 890 900 910 920 

560 570 580 5S0 600 610 620 

AAGGTCGCGACCACAGGCAAGGCGACCCCCTTTCAGGAGTTAGTGTTAAAG6CGGCCGCCCGGGCCAGCTTG 

i t i i t t t i i I i r ! I I I I t I 1 I ! I ! » t t I ' i > t I I t » t t t 1 1 t I t t 1 I t I I 1 I I I I I I I I I I 1 I 1 f I I 

t i i i i i i i i i i i i t i i i i i i t i i i i i t i i i i t i t i t r t i i t t i i i i i i i t i t i t i i i i i « t i I i i i i r i » i t 

AAGGTCGCGACCACAGGCAAGGCGACCC-CCTTTCAGGAGTTAGTGTTAAAGGCGGCCGCCCGGGCCAGCTTG 
930 940 950 960 970 980 990 

630 640 650 BFIO 670 680 690 700 

GCCACCGGTGTTCCGGTAACCACTCACACGiGCAGCAAGTCAGCGCGATGGTGAGCGAGGCAGGCCGCCATTT 

i i i i t t t i i i t i t ( i i t i i t i t t t i * i : i i i i • t t i i i i i i t i i i i i i i i t i i i t it i i t i i i i i i i t t t 
iiiiiiiitiiitiiititiitiitttiiiiiiiiitiitiiitiiiiiitiiit ii i i i i i i i i i i « • i 

GCCACCGGTGTTCCGGTAACCACTCACACfjGCAGCAAGTCAGCGCGATGGTGAGC— AG— CAGGCCGCCATTT 
1000 1010 1020 1030 1040 1050 1060 

710 720 730 740 750 760 770 

TTGAGTCCGAA-GCTTGAG - CCCTCACG GGT7TGTATTGGTCACAGCGATGATACTGACGATTTGAGCTATC 

I I I I I I I t I I t lltllll t I ( I t 1 t t I T | ( I | 1 I | 1 | | t t I I I I I 1 I I I I I 1 1 I I I I I t I t I I I I • I i 1 I 

i i i I i i i i t i i irtiitt > t i t t I i t t t i ) t ) i i i i i i • i t i I • i i i • t ! t i < • i i • ■ t i • i i i i i i • i i 

TTGAGTCCGAAGGCTTGAGCCCCTCACr.GGTTTGTATTGGTCACAGCGATGATACTGACGATTTGAGCTATC 
1070 1080 1090 1100 1110 1120 1130 

780 790 800 810 820 830 840 

TCACCGCCCT-GCT — GCGCGGA"I'.'-iCCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAG 

i t i i t I i I i i tit i i t t i i t t ( i : t ; I i t i i I i i t t t t t t t i i i i ■ i i i i t i i i i i i i i t I i i i i i i i i 

i i i t i » i i i i iii i t • : i i i t i i i i t i . i • i t t t i i i t i i i • i ■ i i i i i t • i i i i i i t * t « i i i i i i i i 

TCACCGCCCTCGCTGCGCGCGGATV^CCl CATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAG 
1140 1150 1160 1170 1180 1 190 1200 

850 860 870 880 890 900 910 

ATAATGCGAGTGCATCACCGCTCCTGGGCATCCGTTCGTGGCAAACACGGGCTCTCTTGATCAAGGCGCTCA 

i i ■ i i t i i i i t > i i i i i t i i i i i i i i t i t i i i i t > i i i i i i i t i i i i i t i i i i i i i i i i i i i » i i i i i i i 
■ i i i i i i i i i i i t i i i i t i i i i i i i i i i i t i i t i i i i i i t i i i t i i i t i i i t i i i i i i t i i i i i i i i i i i 

ATAATGCGAGTGCATCAGCCCTCCr^GG-.CATCCGTTCGTGGCAAACACGGGCTCTCTTGATCAAGGCGCTCA 
1210 1220 1230 1240 1250 1260 1270 1280 

920 930 340 950 960 970 980 

TCGACCAAGGCTACATGAAACAAA"!'CG7CG;TTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCA 

i I i I i i I t t I i i • i t i i t ; i t i i , t i » > t i ) f i • r * i t t t • i i i i t • i i i i • i i i i i i i i • i i i • i i i i i • t 
iitiitiiitttttiiiiiiitiiitiiitiiiittiitiitiiiiiiiiiiiittiiiiiiiitiiiiiii 

TCGACCAAGGCTACATGAAACAAATCCTCGTTTCGAATGACTGGCTGTTCGGGTTTTCGAGCTATGTCACCA 
1290 1300 1310 1320 1330 1340 1350 

990 1000 1010 1020 1030 1040 1050 

ACATCAT6GACGTGATGGATCGCGTGAACCCCGACG!GGATGGCCTTCATTCCACTGAGAGTGATCCCATT-C 

i i » i i t i i i i t i i ♦ t i » i > i i i * t t t t t » i t t i i i i i t i i ( i » » i i i i i i i i i i i » t i i i » i i i i t i i i i i 
i i i i i i i i i i i i i i i i » t i i i i i i i t i i t i t t i t i i i i i i t i * t i i i i » i i t i i i i i i i i i i i i i i i i » i i 

ACATCATGGACGTGATGGATCGCciT'GAACCCCi^ACfeGGATGGCCTTCATTCCACTGAGAGTGATCCCATTCC 
1360 1370 1380 1390 1400 1410 1420 

1060 1070 10S0 1090 1100 1110 1120 

TACGAGAGAAGGGCGTCCCACAGSMAACGCTGiCCAGGCATCACTGTGACTAACCCGGCGCGGTTCTGTGTCA 

i i i i i i i i i i i \ t i i i i i i t i t i t i i i i i 1 t : i i ( i i i i i i i i i i t i i t i i i i i i i i i t i i i i i i i i i i i 
i t i i i i | i • i I I I I I i I i i t I i I t I » t i t : ( I I I I I ( I I i I I I > I i I i I I I I I I I I I I i I I i I I I i t i t i 

TACGAGPiGPAGGGCGT(:CCnr:A'-C^A^V RC i a.RCAGf5CftTCACTGTGftCTAACCCGGCGCGSTTCT-TGTCA 



1430 1440 14b0 1460 147U 14SU 143U 

1130 1140 1150 1160 1170 1180 1190 

CCGA-CTTGC CGTGCATGAC6CCA1 CTGGATCCTTCCACGCPiGCGGCCACTATTCCCCGTCftAGATACC 

till i i i i i ill i i i i i i i i i j i i t i i i t i i i i » i i t i i i i i i i i i i i i i i i i t t i i i i i i i i i t 

I t I I I I I I I III I I t I I t t t I > I t I ' I t I t I t t t I > I I I I t I t t I I I T I I I I I > I I I t I 1 t I I I I 

CCGACCTTGCGGGCGT-CATGACGCCATCTGGATCCTTCCAGCCAGCGGCCACTATTCCCCGTCAAGATACC 
1500 1510 1520 1530 1540 1550 1560 



1200 1210 1220 1230 1240 1250 12GO 

GAACGATGAAGTCGCGCATCGATCGATAGGCATCTTCAATGTGATCAGGGCTGCCACCTCCAAAGCCGGTGG 

I I t I I f I I I I I I I I 1 I I I I I I I ( I I I I t • I I I 1 I t I t I I I I I I f I ■ I I I I I I I I I I I t I I I t I I | | 11(1 

I t i t I i i I I t t i i i i i t i i i r i t i i i i i i « t i •• i t ( • i i i i t f i i t t i i i i t i i i i i t i i i i • i i titt 

GAACGATGAAGTCGCGCATCGATCGATAGGCATCTTCAATTTGATCAGGGCTGCCACCTCCAAAGCC-GTGG 
1570 1580 1590 1600 1610 1620 1630 



1270 1280 1290 1300 1310 1320 X 

CCACCCCTGTCGATAGTCTTGAGGGACGGTAGCGACGACCGTGCTTTTCGTGAACTGCAG 

i i t i I I t i i t i i i t t i i \ i i I t titi i i i i i i i i » i t i i i i i i i t i i l t i i i 

i i i t i t i i i i i i i t i i i i i i i i i t i i tiit t i i t i i i i i i i i t i i i i i i i i i 

CCACCCCTGTCGATAGTCTTGA-GGAC-GTAGGGCACACCGTGCTTTTC — GAACTGCAG 
IG40 1650 16G0 1670 1680 1690 X 
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TRN21TNPA 
Transposon 
X0489 1 

tnpA gene; 
Transposon 
Transposon 
Prokaryota; 
1 ( bases 1 



3176 bp 
Tn21 tnpA 

tnpR gene; 

Tn2L 

Tn21 

Bacteria if 
to 3176) 



ds-DNA 
gene for 



BCT 
transposase. 



15-MAR-1988 



transposase ? transposon. 



Transposon Tn21, 



Ward*E. and GrinstediJ. 

The nucleotide sequence of the tnpA gene 
Nucleic Acids Pes. 15, 1799-1806 (1987) 
simple automatic 
C 1 ] enum. 1 to 3 1 7*6. 

Locat i on/Qua 1 i -f i ers 

< 1. . 54 

/note^' 1 tnpR gene" 
57. . 3023 

/note= " transposase < AA 1 -988 > " 
652 a 1057 c 915 g 552 t 



Of Tn21 



Initial Score = 87 Optimized Score = 606 Significance = 15.98 

Residue Identity « 52% Matches « 761 Mismatches « 472 

Gaps = 212 Conservative Substitutions = 0 

X lO 20 30 40 50 

CTGCAGCCTGACTCGGCA — CCAG-TCGC — TGCAAG CAGAGTCGTAAGCAATCGCAA 

i i i i i t i i tit i i i i i i iii i ■( i ii * lit i i 

iiititit tit tiiitt iii i ii ii)t i i i i t 

TCCATCCTGTCCGCCGCC-GAGCGt^GAAPiGCCrGCTGGCGTTGCCGGACTCCAAGGACG-ACCTGATC-CGA 

70 80 90 100 HO 120 130 



60 70 SO 90 lOO 110 
GGGGGCAGCATGCAAACGAGA— AGGG7 TGT-GCTCA — AGTCTGCGGC— CG CAGGAACTCTGCTGGGCG 

i i ill t i i t i i i i t t i lit t i i i i t i ii iii i ti i i i i i i t 

ii iii i i i i i i i t i t i lit i i t i i i i ii ill i ii i t i i i t i 

CATTACA— CATTC— AACGATACCGACCTCTCGATCATCCGACAGCGGCGCGGGCCAGCCAATCGGCTGGGCT 
140 150 160 170 ISO 190 200 



120 130 140 150 160 170 180 

GCCTGGCTG — GGTGCGCGACGTG-GCTGGATCGATCGGCA — CAGGCGATCGGATCAA-TA— CGTGCGCGT 

i ii it i i i ii it t*( ii ii ii i it tt lit ii it iii ii 

i i i t i t i i ii it tit i ' it it i ii it iii it t i iii it 

TCGCGG-TGCAGCTCTGTTACCTGCGCT ("TCCCG— GCGTCATCCTGGGCGTC— GATGAACTACCGTTTCCGC 
210 220 230 240 250 260 270 



190 2QO 210 920 230 240 250 

CCT ATC ACAATCTCTGAAGCGGGT TTCftCACl GACT CACGAGGACATCTGCGGCAGCTCGGCAGGATTCTTG 

ill » i i t i i t i » i i i * t t it iiii i ii tilt ii i ii i i 

iti I t t I i t t t » ! i * t : t i ill I I ii iiii ii t it i i 

CCT TGT-TGAAGCCGGTC&CCrfiAC-CAGCCA-AAGGTCGGCGTCGAAAGCT-GGAACGAGTACGG 

280 290 300 310 320 330 



260 270 2SO 290 300 31 0 320 
CGTGCTTGGCCAGAGTTCTTCGGTAGCCGCAAAGCTCTAGCGGAAAAGGCTGTGAGAGGATTGCG C 

t ii t iiii t ii iti : ii i til I ill iiii ii ii it i 

i ii t iiii t it tit t ii i lit i iii i iii ii it it i 

CCAGCGGGAGCAGA-CCCGGCGCGAGCACCTGAG — CGAGCTGCAAA — CCGTGTTCGGTTTCCGGCCCTTC 
340 350 360 370 380 390 400 



330 340 350 360 370 

GCCA-GAG CGGCTGG-CGTGCGAACGATTGTCGATGTGTCGA — CTTTC— GATATCGG — TCGCG 

lit iii iiii it iii t tit itt ii iii t iiii ii iii i 

iti iii iiii ii iii i » ■ i iii if lit i i it t ii iii i 

ACCATGAGCCATTACCGGCAGGCCGTCCAGATGCTGACCGAGCTGGCGATGCAAACCGACAAAGGCATCGTG 
410 420 430 440 450 460 470 

380 390 400 410 420 430 • 440 

-ACGTCAGTTTATTGGCCGAGGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGGCGACCGGCTTGTGGTTC 

I III III lit! I III 11 I II III! It It I III! I III 

I III Ttl II It I III II I II IIII It If I II II I I II 

CTGGCCAGCGCCTTGATCG— GGCACCTGCGGCGGCAGTCGGTCAT— TC— TGCC — CGCCCTCAACGCCG-TC 
480 490 500 510 520 530 540 



450 460 470 480 490 500 510 

GA-CCCGCCACTTTCGATGCGATTGAGGTATGTAGAGGAACTCACACAGTTCTTC — CTGCGTG — AGATTC 

ill iiii i ) i i t i i i i ii lit iti iii i i III ii i 

iii iiii iti iiii i i ii til iii iiii i iti ii i 

GAGCGGGCGAGTGCCGAGGCGA— TCACCCGTG CTAACCGGCGCA — TCTACGACGCCTTGGCCGAACC 

550 560 570 580 590 600 



520 530 540 550 560 570 580 

AAT-ATGGCATCGAAGACACCGGAATTAiSGGCGGGCATTATCAAG-GTCGCGACCACAGGCAAGGCGACCCC 

i i ii ii i i tit t ii i i i i i i i i ii iii ii i i i i i t i i i i i 

i t ii t i r i lit i it i i i i t t i i it iii it i i i i i t i i i i i 

ACTGGCGGACGCGCA— TCGCCGCCGCC1CGACGATC— TGCTCAAGCGCCGGGACAAC— GGCAAGACGACCTG 
610 620 630 640 650 660 670 



590 GOO 610 620 630 640 650 

CTTTCAGGAGTTAGTGTTAAAGGCGGCCGCCCGGGCCAGCTTGGC — CA CCGGTGTTCCGGTAACCAC 

II II I It I t 1 1)1 It till lit! 11 lit I III I 

II II I II III lit it I I 1 I It t I II III I 111 t 

GTT GGCTTGGTTGCGCCAGTC-TCCGGCCAAGCCAAATTCGCGGCATATGCTGGAACACATCGAACGCC 

680 6S0 700 710 720 730 740 



660 670 680 690 700 710 

TCA CACGGCAGCAAGTC— AGC— GC— GATGGTGAGCGA — GGCAGG-CCGCCATTTTTGAGT— CCGAAGC 

iii t i i t t i i i ii t i it i r i tit iti ti i iii i ii iti 

iii i i i i i i i i i i t i i i i i i iii itt ii iiii i ii iii 

TCAAGGCATGGCAGGCACTCGATCTGCCTnCCGGCATCGAGCGGCTGGTTCACCAGAACCGCCTGCTCAAGA 
750 760 770 780 790 800 810 



720 730 740 750 760 770 780 

TTGAGCCCTCACGGGTTTGTATTGGTCACAGCGATGATACTGAC — GATTTGAGCTATCTCACCGCCCTGCT 

it i i i i i iii i i tti t t ti i t i i itt iiii t i i i i t til 

ii i i i i i lit i i tit ti ii iii i iii iiii i it it i tit 

TT — GCCCGCGAGGGCGGCCAGATGACAC— CCGCCGA— CCTGGCCAAATTCGAGC CGCAACGGC — GCT 

820 830 840 850 860 870 



790 800 810 820 830 840 850 

GCGCGGATACCTCATCGGTCTAGACCAC'ATCCCGCACAGTGCGATT GGTCTAGAAGATAAT — GCGA-GTGC 

iii i iii t • i i i i i t i i t it iti i it it iii iii iii 

ill I lit ■ I I I i t i t ; t I ti 111 i It II ill III ill 

ACGCCACT — CTCGTGGCGCT— GGCCAC — CGAGGG.CATGGCCACCGTCACCGACGA— AATCATCGACCTGC 
880 390 300 310 920 930 940 



860 870 880 890 900 910 

ATCACCGC-TCCTGEC-CATCCGT.r C&,-TGGC^,AACACGGSC-TC TCTTGATCAAGGCGCTCATCG 

t i i i i i i i i i t i i i i t • i it t t iii i it ii i ii t i i i i iii i 

■ i i i i t i t t i t i i t i iii t t t t tit i ii it i it t i t i i tit i 

ACGACCGCATCCTGGGTAAGCTGTTTA^CRCTGCCAAGAATAAGCATCAGCAGCAGTTCCAGGCG-TCA — G 
950 360 9VO 980 990 1000 1010 



920 930 940 5350 960 970 980 

ACCAAGGCTACA H3AAACAAATCCTCGT n\X^AATG-ACTGGCTGTTCGGGTTTTCGAGCT-AT-GTCACCA 

i i i i i i iii ill t i i : ti it it ill iiii i i iii it i i iii 

i i i i i i iii iti tt t i i< ii ii lit iiit i i iii ii i i iii 

GC-AAGGC — CAT CAACGCCAAGGT ACGTCTGTACGGGCGCATCGG — TCAGGCGCTGATCGACGCCA 

1020 1030 1040 1050 1060 1070 



990 1000 1010 1020 1030 1040 
A— CA— TCATG6ACGTGATG- GATCf^— CG fiisAACCCCGACGGGATGGCCT— TCATTCC ACTGAGAGTGAT 

i ii iii ii ii iiit i t t t i i i it lit i i i i i i i t i i i i t i i 

i ti iti it ii i i i i t t i t i i i i ii lit t i i i i i i i i i i i i i i 

AGCAATCA-GGCCGCGATGCGTTTGCCGCCATCGAGGCCGTCATGTCTTGGGATTCCTTTGCCGAGAGCG-T 
lOSO 1090 HOO 1110 1120 1130 1140 



1050 1060 1070 lOSO 1090 1100 1110 

CCCATTCTACGAG— AGAAGGGCGTCCCACAGGAAACGCTGCCAGGCATCACTGTGACTAACCCGGCGCGGTT 

ii i i I i i i t i i ii tii i iiii it ii tit i i i t i i i i 

ii t i i i t i i i i t i lit t iiii i i i i • i i i i i i i i i i 

CAC CGAGGCGCAGAAGCTCGCGCAACCCG— GTGGCTTC — GGTTTC— CTGCATCGCA— TCGGCG— AGAG 

1150 1160 1170 1180 1190 1200 1210 



U20 1130 1140 U50 1160 1170 1 180 

CTGTGTCACCGACT— TGCCG TRCATGACGCCAT — CTGGATCCTTC— CACGCAGCGGCCACTATTCCCC 

ii i ill] it ttti i t i t ti t i it t i i i,it t i i i t i t t it 

II I I I t I II llll till It It II I I I til II IIII t I t I 

CTACGCCACC — CTGCGCCGCTATGCA — CCGGAATTCCTTG-CCGTGCTCAAGCTGCGGGCCGCGCCCGCC 
1220 1230 1240 1250 12GO 1270 



1190 1200 1210 1220 1230 1240 1250 

GTCAAGATACCGAACGATGAAGTCGCGCATC— GATCG — ATAGGCATCTTCAATGTGATCAGGGCTGCCACC 

i iii i i ttii t t i ii it it iii i i t i ii i i i i i i i i 

■ till i tilt iii ii ii ii lit iiii it i i t t I i i I 

GCCAAAAACGTGCTTGATGCCATTGAG.STGCTGCGCGGCATGAACACCGACAACGCCCGCA— AGCTGCCAGC 
1280 1290 1300 1310 1320 1330 1340 

1260 1270 1280 1290 1300 1310 
TCCAAAG— CCGG — TGGCCA — CCCCTGTCG-ATAGTCTTGAGGGACGGTAGCGACGACCGTGCTT — T 

Itt till! I tl III Ittt tl I lit I I t I I I I I 1 I I I 

til I till I t I tit I t I I II I II t I I I I I I I I t I I I 

CGATGCACCGACCGGCTTCATCAAGCCGCGCTGGCAGAAACT — GGTGATG — ACCGACG— CCG— GCATCGA 
1350 1360 1370 ? 330 1390 1400 1410 



1320 X 
TCGTG — AACTGCAG 

iii iiii 

CCGGGCCT ACT ACG A AC TGTGCGCG 
1420 1430 
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Sorghum vulgars tbRNA -for phosphoenol pyruvate i rival 



XI 7379 3147 bp 

Sorghum vulgare wRNA 
photosynthesis (EC 4. 
XI 7379 



UNA 

■For phosphoenol pyruvate 
1. l. 31 ). 



15-SEP-1990 
involved in C4 



Unknown 

Unci ass i f led. 

1 (bases 1 to 3147) 

Cretin ? C. » Keryer»E. s Tagu*D. » Lepiniec.L. » VidaliJ. and Gadal .P. 
Complete cDNA sequence of sorghum phosphoenol pyruvate carboxylase 
involved in. C4 photosynthesis 
Nucleic Acids Pes, «P. G58-65S ( 1990) 
unannotated s ta f f _ent ry 

2 (bases 1 to 3.147) 
CretimC. D. 

Unpublished. (1969) see COMMENT 
unannotated stB Ff _entry 

675 e 94R c 915 q 611 t 



for author address 



ORIGIN 



Initial Score « SO Optimized Score = , 537 Significance » 12.71 

Residue Identity = 51% Matches 740 Mismatches = 490 

Gaps = 203 Conservative Substitutions = O 

X 1 0 20 30 40 50 

CTGCH^GC-CTGACT-CbiGCACCAGTCGCT — GCAAG—CAGAGTCGTAAGCAATCG— CAAGG 
• ' iii t iii i it it i ii ii i i iii i ii i i i i i 

! 1 III I III I II It I II I I f f III I II I I I I I 

GTCGGGTGAAAGGCAAGCGCCCftCTGCTGCCCCCGGACCTTCCCATGftCCGAGGAGATCGCCGACGTCATCG 
14SO 1470 1480 1490 1500 1510 1520 1530 

GO 70 SO 90 10O 110 120 

GGGCAGCAT — GCAAACGAGAAG— GGTTGTGCTCAAG-TCTGCGGCCGCAGGAAC— TCTGCTGGGCGGCCTG 

ill ill ii i t lit > ii ii i i i i i i i i t i ill t i ii i 

iii lit i» t t iti i t t » t it iiiii t i i iii i i it i 

GCGC — CATGCGCGTCCTG6CCGAGCTCCCGATCGAGAGCTTCGGCCCCTACATCATCTCCATG — TGCACG 
1540 1550 1560 1570 1580 1590 



130 140 150 160 170 180 190 

GCTGGGTGCGCGACGTGGCTGGATCGATCG-GCACAGGCGATCGGATCAATACGTGCGCGTCCTATCACAAT 

III II lllll III I II lit II I I I 1 I I I I I III II I I I I 

II I II I I I t I lit I II (It II I II I ft I I I III II I I I I 

GC— GCCCTCG— GACGT-GCTCG— CCG-7CGAGCTCCTGC— AGCG CGAGATGTG— GCATTCGCCAGCGGT 

1600 1610 1620 1630 1640 1650 1660 



200 210 220 230 240 250 
CTC — TGAAGCGGGTTTCACACTGACTCACGAGGACATCTGC— GGCAGCTCGGCAGGATTCTTG CGTG 

it it I i i t t t ii i ► tit i t t i ill it tilt I i t II i 

it it lllll t t I 1 > III Itil lit 11 tilt t tilt I 

CCCCGTGGTGCCGCTGTTCGA — GAGGCTGGCCGAC — CTGCAGGCGGCGCGGCCG — TCCGTGGAGAAGCT 
1670 1680 1690 1700 1710 1720 

260 270 280 290 300 310 320 

CTTGGCCAGAGTTCTTCGGTAGCCGCAA — AGCTCTAGCGGAAAAGGCTGTGAGAGGATTGCGCGCCAGAGC 

lit Iti i it t t i i i iti i it i itt ii it I ii lit i ii i i 

■ it iii t it titi: tit i it tiit ii lit it tilt iii i 

CTTCTCCACTG-ACT — GGTA-CATCAACCACATC-AACGGCAA — GCAG-CAGGTGATGGTCGGCTACTCC 
1 730 1 740 1 750 1 7SO 1 770 1 7S0 1 730 



330 340 350 3BO 370 380 330 

GGCTGGCGTGCGAACGATTGTCG-ATGT— GTCGACTTTCGATATCGGT— CGCGACGTCAGTTTA— TTGGCCG 

iii I iitt iii lit it i t i i i t t i i i i i i i i ■ t i t 

til i tiit tit iii iitt i i i i i t i i i i i i titii 

GACTCCGGCAAGGACGCCGGCCGCC7GTCCGCGECGT — GGCAGCTGTACGTGGCGCAGGAGGAGATGGCCA 
1800 1810 1820 1830 1840 1850 I860 



400 4iO 420 430 440 450 460 
AGGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGGCGACCGGCTTGTGGTTCGACC-CG CCACTTTCG 

i i i i tit ttti t i t i i il lit iiiii ii i i tl iii III 

titi iii iiit t tttt iiiii iiiii it it il iti t t i 

AGGT GGC-CAAGAAGT — ACGGCGTGAAGGTGAC CTTGTTCCACGGCCGCGGTGGCACCGTCG 

1870 1880 1890 1900 1910 1920 



470 480 490 500 510 520 
ATGCGATTGAGGTATGTAGAGGAACTCACACAGTTCTT — CCTG— CGTGAGATTCAAT — ATGG CATCG 

ii iii t i i i t i » i iii t i i t i it iii iii ■ i i i 

ii iii ii i ii i i t iii i t i i i it tit iii iiit 

— GCAGGGGCGTTGGCCCGACGCAC-CTCGCCATCCGTGCCCAGCCGCCGGACACCATCAACGGGTCCATCC 
1930 1940 ISSO I960 1970 19SO 1990 

530 540 550 5S0 570 580 590 
AAGACACCGGAATTAGGGCGGGCATTATCAAGGTCGCGACCACAGGCAAGGCGACCC CCTTTCAGGAGT 

I it i i i i t i I i i iii il il t i il iti it it lit i t i 

I it t i i t t t t i i ; t t ii ii t i it iii it ii i t i i i t 

TGGTGAC — GGTGCAGGGCGAG— STCATCGAGTTCATGTTC GGGGAGGAGAACCTGTGCTTCCAGTCTC 

2OO0 20 lO 2020 2030 2040 2050 



600 610 620 630 640 650 660 

T — AG— TGTTAAAGGCGGCCGC-CCGGGCCA— GG— TTG — GCCACCGGTGT— TCCGGTAACCACTCACACGG 

i ii lit i til iii i t it ii it it ii iiiii i til t t t it t 

t ti iii t iti lit t i t t it ii ii it t t i i i t iti t i t ii i 

TGCAGCGGTTCACGGCCGCCACGCTGC-hGCACGCCATGCACCCGCCGGTCTCTCCCG-AGTGGCGCAATATG 
2060 2070 2080 2090 2100 2110 2120 



670 btfu kS30 700 710 720 730 

CAGCAPiGTCPiGCSCGATGGTGAGCGftGGCAeGiCCGCCATTTTTGAGTCCGAPiGCTTGAGCCCTC-ACGGGTT 

lit i t t t ■ t i i i i i i ii i i it it i it till it 

lit I t I it i : t i i t t iiii it ii i I i iiii it 

GAGGAGATGCCCGTCGTCGCCAAGGnEG-AGTACGTCGTCGTCGTCAAGGAGCCGCGATTCGTCGAGTACTT 
2130 2140 2150 2160 2170 21S0 2190 



740 750 760 770 780 790 

TGTATTGGTCACAGCGATGATACTG ACG ATTTGAGCTATC— TCA— CCGCCCTGCTGCGCGGATACC 

ii ii ii i li> iiii iti i ill i til ii i i ii ti i ill ii 

ti ti it i iti ttti iti i ill i lit ti t i it it t iii'ii 

CAGATCGGCTAC — CCCTGAGACTGAGTACGGGAAGATGAAC-ATCGGCAGCAGGCCAGCCAAGAGGAAGCC 
2200 22 lO 2220 2230 2240 2250 2260 



800 810 S20 830 840 SSO 

— TCATCGGTCTAGACCACATCCCGCACAGTGCGAT TGG TCTAGAAGATA— ATGCGAGTGCATC 

I 111 I 1(111 It II till It 111 III I It I ' t llll I I 

• iiii irtii ii ii i t i i i i tit ill i it i i iiitt t 

GGGCGGCGG— CATCACCAC — CCTGC GTGCCATCCCCTGGATCTTCTCGTGGACACAGACGAG— GTTCC 

2270 2280 2290 2300 2310 2320 2330 



860 870 880 890 900 910 920 

ACCGCTCCTGGGCATCCGTTCGTGGC — AAACACGGGCTCTCTTGATCAAGGCGCTCATCGACCAAGGCTAC 

ti iiii i t i i i i t i i t i t t i i i i i i i i t t t i i i i i i i t i ii 

t t till lilt I I I I 1 I 111 1 I I J I I I I I 1 I I I I I till! It 

AC — CTCC CCGT— GTGGCTGGGAGTCGGCGCCGCCT — TCAAGTGGGCCATCGA-CAAGG — AC 

2340 2350 2360 2370 2380 



930 940 350 960 370 980 990 

ATGAAACAAATCCTCGTTTCGAATGACT GGCTGTTCGGGTTTTCGAGCTATGTCACCAACATCAT6GACGTG 

ii i i i ill ii iiii t iti i it t til it i i i i i i it 

iiii i lit ti it ti t lit i it t lit i i t i i i i i i i 

ATCAAGAACTTCCAGGTCCTCAAGGA GATGTACAACGAGTGGCCATTCTTCAGGGTCACCCTGGACCTG 

2390 2400 24 lO 2420 2430 2440 2450 



1000 1010 1020 1030 1040 1050 1060 

ATGGATCGCGTGAACCCCGACGGGATGGCCT— TCATTCCACTGAGAGT GATCCCATTCT — ACGAGA 

iiii tl i il i i*ii i t i t t i t t i i it iti til iiii 

ttii ti i i i t t t » t ttt iiii i i i ii lit it i iitt 

CTGGAGATGGTTTTCGCC— AAGGGAGACCCTG.GCA7 TGC-CGGCTTGTACGACGAGCTGCTTGTTGCCGAG- 
2460 2470 2480 2490 2500 2510 2520 



1070 1080 lOSO 1100 1UO 1120 1130 

GAAGGGCGTCCCACAGGAAACGCTGC— CAGGCATCACTGTGACTA— ACCCGGCGCGGTTCT6TGTCACCGAC 

■ it ill i i i t i i t i t i i t t it i i t t i it till i iitt 

tii iti tt it it ii iitt i ii t t t i i i i iitt t iiii 

GAACTCAAGCXJCTTTGGGftA-GCAGCTCAGGGACAAATACGTGGAGACACAGC-ftGCTTC — TCCTACAGA- 
2530 2540 2550 25GO 2570 2580 2590 



1I40 1150 1160 1170 1180 1130 

TTGCCGTGCATGACGCCA-TCT GGATCC-TTCC — ACGCAGCGGCCAC— TATTCCCCGT — CAAGA 

i t i t t t t t t ii tit i i t i i i i t i i i i i i i i i i t t i i i lit 

I II t III I I II ttl tilt! llll I I t I I III I I t I t I I I I 

TCGCTGGGCACAAGGACATTCTTGAAGGCGATCCATTCCTGAAGCAGGGGCTGCGTCTGCGCAATCCCTACA 
2600 2610 2620 2630 2640 2650. 2660 



1200 1210 1220 1230 1240 1250 1260 

T-ACCGAACGATGAAGTCGCGCATCGATCGATAGGCATCTTCAATGTGATCAGGGCTGCCACCTCCAA — AG 

iiii ii iiii lit tit it ti tii til iiii iii ii tit t 

iiii ii iitt iti iii t t it tii t i i i ■ t i iii ti iti t 

TCACC — ACCCTGAA — CGTG— TTCCftGGCCTA — CACGCTGAAGCGGATAAGGGACCCCAGCTTCAAGGTG 
2670 2680 2S90 2700 2710 2720 2730 



1270 1280 12SG 1300 1310 1320 
CCG — GTGGCCACCCCTGTCGATAGTCTTGAGGGACGGTAGC — GACGACCG TGCTTTTCGTGAACTGC 

ii i iti ii i i i t i t i ti tilt t i i i iii iii t i t i i t i 

ii i tii ti ttiii i i ii iiii ti it tit i t i t t i ■ i i i 

ACGCCGCAGCCGCCGCTGTCCAAQiSAGTTCGCCGACGAGAACAAGCCCGCCGGACTGGTGAAGCTGAACGGC 
2740 2750 2760 2770 2780 2790 2800 



X 

—AG 

i t 

t i 

GAGCGAGTACCGC 
X 2810 



G. L0W344-FIGL SEQ 

NEUTRP1 n. crassa tri -functional tryptophan biosynthesis gen 



LOCUS 

DEFINITION 
ACCESSION 
KEYWORDS 
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ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
STANDARD 

BASE COUNT 

ORIGIN 



NEUTRP1 



750 bp 



DMA 



PLN 



01 -AUG- 1983 



n. crassa tri -functional tryptophan biosynthesis gene trp-l. 
JO 1252 

mu 1 1 i -f unc t i ona 1 enzy me. 
neurospora crassa. 
Nsurospora crassa 

Eukaryota? Plantae? Thai lob ionta s Eumycota; Ascomycot i na ; 
PyrenoYDycet.es? Sordarialess Sordar iaceae ; Neurospora; crassa. 
1 (bases 1 to 2750) 
Schechtman ? M. G. and Yano-fsky.C. 

structure of the tri -functional trp-l gene from neurospora crassa 
and its aberrant expression in escherichia coli 
J. Mol. Appl. Genet. 2, 83-S9 (1983) 
simple staff_review 

572 a 821 c 727 g S30 t 



Initial Score = 85 Optimized Score « 594 Significance = 11.62 

Residue Identity = 51% Matches = 738 Mismatches = 487 

Gaps = 213 Conservat i ve Substitutions = O 



X 10 20 30 40 SO 

CTGCAGCCTGACTCGGC-ACCAGTCGCTGCAAGCAGAGTCGTAAGCAATCG — CAAGGGGGC 

i i ill i tti i tiit it i it ii it i ill i till 

i i ill t iti t i i i i it i it iiit i lit i iiii 

ATCACAATGTCGTCCTCCTCAGTCGTCGACCACTCTCCCCA CGATTC-CGCTCCTTCGCCCCTGGTGCC 

2SO 270 280 290 300 310 320 



60 70 SO 90 100 HO 
A GCATGCAAACGAGAAGGGTTGTGCTCAAGTCTG CG-GCCGCAGGAAC— TCTGCTGG GCGGC 

i i I I I I I t t i ; til I it ii I i t i i • i i • i i til 

I lltltll I I I .Ittllt II i • I I I I I I I i I 111 

AACCGCCTCCAACC — TCATCCTCATCGACAACTATGATTCGTTTACCTGGAACGTCTACCAGTACCTCGTC 
330 340 350 360 370 380 390 



120 130 140 ISO ISO 1 70 180 

CTGGCTGGGTGCGCGACGTGGCTG^eATCGATCGGCACAGGCGATCGGATCA — AT — ACGTGCGCGTCCTAT 

ii i lit ii i I (ii t t it it ill lit i iiii it tii it t it i 

it i tii ii i i tit t t t t it tit iii t iiii ii tii ti i ti i 

CTCG-AGGGCGC-CAAGGTGACCG — TC-TTCCGCA ACGACCACATCACCATCGACGAGCTCATCGCA- 

400 410 420 430 440 450 



190 200 21 0 220 230 240 250 

CACAATCTC-TGAAGCGGGT-TTCA-CACTGACTCACGAGGACATCTGCGGCAGCTCGGCAGGATTCTTGCG 

i i i i l III t i tit t tit iii t i i i i i i i ■ i i tit i i i i i i 

i ii i i itt t t iii i lit i ii i iiii ■ i i i i i t it tii lit 

AAGAACCCCACCCAGCTCGTCATCAGCCCTG— GGCCCG — GTCATC— CCGGCACCGACTCCGGTATCTCGCG 
460 470 4S0 490 500 510 520 



260 270 280 290 300 310 320 

TGCTTGGC — CAGAGTTCTTCGGTAGCCGCAAAGCTCTAGCGGAAAAGGCTGTGAGAGGATTGCGCGCCAGA 

i ill iiii iiit i t i i lit li t iii ii i i i i i i tit 

i tii iiii i i t i tttt iii ii i lit ii i t ti i i iii 

CG— ATGCCATCAG— GCACTTC GCCGGCAAGATC— CCCATCTTTGGC— GT — GTGCATGGGCCAGCAGT 

530 540 550 560 570 580 



330 340 350 360 370 380 390 

GCGGC — TGGCGTGCGAACGATTGTCGATGTGTCGACTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCG 

t i i ti iii i i i i lit iiii ii iii iiii i ti it i i i i 

II ■ II 111 t I I I 111 llll II III till I II II I I I I 

GCATCTTTGACGT— CTATGG — CGGCGACGTGT — GCT-TCG CCGGT— GAGA— TTC — TGCACGGAAAG 

590 600 610 620 630 640 



400 410 420 430 440 450 460 

AGGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGGCGACCGGCTTGTGGTTCGACCCGCCACTTTC — GAT 

i iii ii iii tit i i it iiii iii it l iiii t i tit 

i itt ii iii tit i t it tiit tit it i iiii i i tti 

acctctcctctgc-gcc-acgacg:gcaagg;g,cgcatatgccggtctgtctcaggatctgccagtgacgagat 

650 660 670 680 690 7O0 71 0 



470 480 490 500 510 520 

GCGATTGAGGTATGTAGAGGAACTCA CACASTTCTTCC TGCGT— GAGATTCAATATGGCAT — CGA 

ill lit i i i : i i i i t i tit ii i i t i i t i i t i i i i i i i i 

tit t t i t i i i i i t tii tit ii ill i lit it i I ■ i i iti 

ACCACT CTCT— TGCCGGTACTC ATGTCACCCTTC — CCGAGTGCTTGGAGGTTACCTCTTGGATTGCGA 

720 730 740 750 760 770 780 



530 540 550 560 570 580 590 

A-GACACCGGAATTAGGGCGGGCATTATCAAGGTCGCGACCACAGGCAAGGCG-ACCCCCTTTCAGGAGTT- 

t I ( ill I t I i * i i ii ill ti itt t i I t I I ii ti ii it til 

i it lit i ti i iti it til it ill iiiii t ii it ii t t ill 

AGGAGGACGGTTCCAAGGGTGTCATCATGGGTGTC-CG-CCA CAAGGAGTACACCATTGAGGGTGTTC 

790 800 810 820 830 840 



600 610 620 630 640 650 660 

AGTGTTAAAGGCGGCCGCCCGGGCCAGCT — TGGCCACCG— GTGTTCCGGTAAC — CACTCACACGGCAGCA 

III' I I I t I III III I I I I I € I I I I I I I IIIII I I I I 

III I I I I I III III I I I I I I I I I I t I I IIIII till 

AGTTCCACCCGGAGAGTATTCTGTCTGCTGAGGGTCGTGGCATGTTCCGG— AACTTCCTTCACA— TGCAGGG 
850 860 870 880 890 900 910 



670 680 690 700 710 720 
AGTCAGCGCGATGGTGAGCGAG GC AGGCCGCCATTTTTGAGTC CGAAGCTTGA— GC 

it it i it ii tilt t t i i i i t i i i tii tilt i i t t 

lilt I II It III! II 1(111111 III I I I I II II 

AGGCACTTGGGCGGAGAACGAGAGACTGCAAAAGGCCGCCCAGGCACAGGCTGCCAACACAAAGTCCGACGC 
920 930 940 950 960 970 980 990 



730 740 750 760 770 780 790 

CCTCACGGGTTTGTATTGGTCACAGCGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGC-GCGGA 

i lilt it i it t ii t I i iiiii i i i i i i i iiiii t iti 

t till ii t iiiii ti i i t < i i i i i i i t i i i i t i i i i i 

TCCCACGCCCAAGAAGAG CAAC-ATCCTTCAAAAGATTTACG CCCACCGTAAGGCTGCTGTGGA 

1000 101C 1020 1030 1040 1050 



800 810 820 830 840 850 

TACCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTGGTCT AGAAGATAAT— GCGAGTGCATCACC 

i i t i i i tii i iti i it t t i ii ti iitt t t t i i i i i i 

i i i t i i t t i tilt i it t ii it it tiit t i iiiii it 

T-GCTCA — GAAGCAGATTCCTTCC — CTGftGACCTTCTGACCTCCAAGCCGCTTATAACCTGAGCATCGCC 
1060 1070 1080 1090 1100 1110 1120 



S60 870 880 890 900 910 920 

GCTCCTGGGCATCCGTTCGTGGCAAAC ACGGGCTCTCTTGATCAA— GGCGCTCATCGACCAAGGCTAC 

i i i i i lit it t i i t t ti t i i i i i i i i t t i t i tit 

t t I i i lit it ii t i I : i t i i i i i t t i i i i ■ i lit 

CCTCCT-CAAAT-CTCTCTTGTCGACCGTCTTCGCAATTCCCCCTTCGATGTCGCTCTTTGCGCCGAGAT-C 
1130 1140 1150 11G0 1170 1180 



930 940 950 9G0 970 980 990 

ATGA-AACAAATCCTCGTTTCGAATGACTGGCTGTTCG — GGTTTTCGA-GCTATGTCACCAACATCATGGA 

• ii ii tii it if » it ii i it i t t t t i i i lit iii it ii 

tii it iii it ii t ii ii i it i t t it iii iii iii ti ii 

AAGAGGGCATCTCC CTCCAA— GGGTGTCTTTGCGCTTGATATTGACGCTCCGTC— GCAAGCTC — GCA 

1190 1200 12x0 1220 1230 1240 1250 



1000 1010 1020 1030 1040 1050 1060 

CGTGATGGATCGCGTGAACCCCGACGGGft-TGGCCT— TCATTCC — ACTGAGAGTGATCCCATTCTACGAGA 

ii iti iii ii itt iti i ti i i it iii ii iii it iiiii i 

I I I f I III It til lit t II I I It III IIIII It IIIII I 

AGT-ATG CGCTTG CCGGCGGCAGTGTCATCTCGGTCCTGACCGAGCCAGA— GTGGTTCAAGGGCA 

1260 1270 1280 1290 1300 1310 



1070 1080 1090 11O0 1110 1120 

GAAGGGCGTCCCACAGGAAACGCTGCCAGG — CATCACTG — TGACTAACCCG— GCGCGGTTCTGTGTCACC 

ii i till i i tiii iiiii i i i i i i i i itt it iii i it 

it i ttii i i i i i i ■ t t i i i i t i t i i i itt ti iii i ii 

GCATCGATGACCTCCGTGCTGTCCGTC^GGTCCTTAACGGCATGCCCAACCGGCCCGCCGTCCTGCG-CAAG 
1320 1330 1340 1350 1360 1370 1380 



1130 1140 1150 1160 1170 1 180 

GACTTGCCGTGCATGACGCCATCTGGATCCT TCCA CGCAGCGGCCACTATTCCCC G 

i i i i ii i i t i i i i i i i j i iii ii ii i till tit tt i 

ii ii t i t t i i i i t i t i t i iii ii t i i i i i i i i i i i i 

GAGTT — CATCTTTGACGAGTACCAGATCCTCGAAGCCAGACTTGCCGGTGCTGACACTGTTCTCCTCATTG 
1390 1400 1410 1.420 1430 1440 1450 



1190 1200 1210 1220 1230 1240 1250 

TCAAQATACCQAACGATGAftGTCGCQCATCGATCG-ATAGGCft — TCTTCAATGTGATCAGGGCTGCCACCT 



ii li 



TCAAGATGCTCGAGTATGA GCTCC rCuAGCGCCTATACAAGTACTCCTTGT— CTCTCGGCATGGAGC— 

1460 1470 1480 1430 1500 1510 1520 

1260 1270 J.2SO 1290 1300 1310 
CCAAAGCCGGTGGCC — ACCCCTGTCGATA-GTCTTf GAGGGA-CGGTAGCGACGACCGTGCTTTT CGTG 

II II II t II lilt till! II II It It II till 111 

II II II III I I I I If I I I II II I I I I I I till III 

CCCTAGTCGAGGTCCAGAACACCGAGG.AGATGGCCACAGCCATCAAGCTCGGCG-CCAAGGTTATCGGCGTC 
1530 1540 1550 1580 1570 1580 1590 

1320 X 
AACTGCAG 

lit I i 

tit i i 

AACAACCGCAATCTCGAG 
1600 X 1610 



7. LDW344-FIG1. SEQ 

BLYAMYAA Barley a 1 pha-amy i ase type A isozyme mRNA* complete 



LOCUS 

DEFINITION 
ACCESSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
STANDARD 
COMMENT 



mRNA 



CDS 



BASE COUNT 
ORIGIN 



BLYAMYAA 1588 bp SS-inRNA PLN 1 1 —NOV— 1 385 

Barley alpha— amy lase type A isozyme mRNA 5 complete cds. 
JO 1236 

a 1 pha-amy 1 ase ; amy 1 ase. 

Barley ( Hordeum vulgare L, cv. Himalaya 1979 crop) aleurone cell 
stimulated with gibberellic acid? cDNA to mRNA * clone E. 
Hordeum vulgare 

Eukaryota ; P 1 antae 5 Embryob i onta ; Magno 1 i ophy ta ; L i 1 i ops i da ; 

Comme 1 i n i dae ; Cyp&xa( 1 es ? Poaceae ? Hordeum \ vu 1 gare. 

1 (bases 1 to 158S) 

Rogers % J, C. and Mi 1 1 i man » C. 

Isolation and sequence analysis of a barley a 1 pha-amy 1 ase cDNA 
clone 

J. Biol- Chem. 258. 8169-8174 (1983) 
■full sta+f_review 

CI] suggests that there are two a 1 pha-amy 1 ase genes in barley 

aleurone cells (types A and B) ? and that expression o-f these genes 

is a-f-Fected di -f -Ferent ly by gibberellic acid. It is likely that 

a 1 pha-amy 1 ase contains a signal peptide and a mature peptide. The 

latter would start at about position 172 (by comparison with 

a 1 pha-amy lass type B>. A poly-A signal is located at positions 

1557-1562. 

Locat i on/Qua 1 i -F i ers 

L, D 1588 

/note 388 " a-amy I mRNA " 
97. . 1413 

/note« " a 1 pha-amy 1 ase "type A, EC 3. 2. 1. 1" 
344 a 484 c 480 g 280 t 
95 bp upstream of NcoX site; chromosome l. 



Initial 
Res i due 
Gaps 



Score « 
Identity = 



107 Optimized Score = 
50% Matches = 
1 7 1 Conservat i ve 



592 
718 

Substitutions 



S i gn i f i cance « 1 0. 89 
Mismatches = 533 

O 



X 10 20 30 40 

CTGCAGCCTGACTCSGC-ACCAETCGCTGCftftSCAGAGTCGT- 



50 

-AAGCA— ATCGCAAGGGGG 



GGTTGGCGTCCGGCCACC-AAGTCCTCT TTCP.GGGGTTCAACTGGGAGTCGTGGAAGCAGAGCGGCGGGTGG 
160 X 170 ISO ISO 200 210 220 



SO 70 80 30 lOO 110 120 

— CAGCATGCAAACGAGAAGGGTTGTGC: TCA^laTCTGCGGCCGCAGGAACTC-TGCTGGGCGGCCTGGCTGG 



TACAACATGATGAT SGtaCAflbGT t ixACuACA"! CGCCGCT GCC6GAGTCACCCACGTCTGQCTGCCACCGCC6 

230 240 250 260 270 280 290 



130 140 150 ISO 170 180 130 

GTQCGCGACGTGGCTGGATCGATCGGCACAGGC — GATCGGATCAATACGTGCG03TCCTATCACAATCTCT 

ill t t i ii ttti i tit it i titi i i i i i iii ii iii i 

ill iii it ttti i iitti t t t 1 i l i t t i iii i i i t t i 

TCGCACTCCGT — CTCCAACGAAGGTTACATGCCTGGTCGG — CTGTACGACATCGACGCGTC— CAAGTACG 
300 310 320 330 340 350 360 

200 210 220 230 240 250 260 

GAAGCG6GTTTCACACTGA-CTCACG-AGGACATCTGCG-GCAGCTCGGCAGGATTCTTGC-GTGCTTGGCC 

I I t t i t ii itti i t i t t t ill ii I tiiii i i lit i i i lit 

i i t t ■ I ti ttti t t iiiifitti I t i J I I I I til t i i iii 

GCAACGCG GCGGAGCTCAAGTCGCTCATCGGCGCGCTCCACGGCAAG-GGCGTGCAGGCCATCGCC 

370 380 390 400 410 420 

270 280 290 300 310 320 
AGAGTTCTTCGGTAGCCGCAAAGCTCTftGCG — GAAAAGGCTGTGAGAGG— AT— TGC-GC GCCAGAGC 

li ^ii it tt > i lit tt t ttti t i t i t t i t t i t i i t t 

t t tilt it it tit it t t t i i t i t t i t i t i i i t i t i 

-GACATCGTCATCAAC — CACCSCTGCGCCGACTACAAGGSATAGCCGCGGCATCTftCTGCATCTTCGAGGGC 
430 440 45G 4GO 470 4SO 49Q 

330 340 350 360 370 380 390 

GGCTGGCGTGCGAACGATTG — TCGA-TGTGTCGACTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGA 

111 i i i ( ill t ttll ii t i t t lit littiiiti it It till 

i t t t t t t tit I tilt it t I t i lit lit ittitt 11 It till 

GGC — ACCTCCG-ACGGCCGCCTCGACTGGGGC-CCCCACATGATC TGTCGCGACGACACCAAATACTCCGA 
500 510 520 530 540 550 560 

400 410 420 430 440 450 460 

GGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGGCGAC — CGGCTTGTGGTTCGAC— CCG— CCACTTTCGA 

III t I I t I t lit I I t t t I I 111 I till It lit! Iff 

I I I 111 III tit 1 I i I t 1 1 111 I tilt f ( I 1 I I It I 

TGGC — ACCGCAAACCTCGACACC — GGAGCCGACTTCGCCGCCGCGCCCGACATCGACCAC— CTCAA 

570 580 590 600 610 620 

470 480 490 500 51 0 520 
TGCGATTGAGGTATGTAGAG-GAACTCACACAGT TCTTCCTGCGTGAGATTCAATATGGCATCGA 

lti i lit t I t t t i I i I lit till t I til I lit t t t t 

III ill] tlttiittl til tl tt It tit I tiltltl 

— CGACCG— GGT — CCAGCGCGAGCTCAAGGAGTGGCTCCTCTGGCTCAAGAGCGACCTCGGCTTCGACGCG 
630 640 650 660 670 680 690 

30 540 550 560 570 580 590 

AGACACC— GGAATTAGGGCGGGCATTATC— AAGGTCGCGACCACAG— GCAAGGCGAC— CCCCTTTCAGGAG— 

till t t t i I tit i t i it t t t i tti ttit i t t i lit 

till it it i tit i t i ii iitt ill tilt t t i t iii 

TGGCGCCTTGACTTCGCTAGGGGCTACTCGCCGGAGATGGCCAAGGTGTACATCGACGGCACATCCCCGAGC 
700 710 720 730 740 750 760 

600 610 620 630 640 650 660 

TTAGTGTTAAAGGCGGCCGCCCG&GCCAGCTTGGCCACCGG-TGTTCCGGTAftCCACJCACACG-GCAGCAA 

it i ttit t t i i t i t i i i t t i t t t i t t t it t l tti tilt 

ii i tit i t t i t t t i i t i i t t t i i i til it > t iii titt 

CTCGCCGT GGCCGAGGTGTGGGACAATATGGCCACCGGCGGCGACGGCAAGCCCAACTACGACCAGGAC 

770 780 790 800 810 820 830 

670 680 690 700 710 720 730 

G-TCAGCG — CGATGGTGAGCGAGGCAGGCCGCCATT7TTGAGTCCGAAGCTTGAGCCCTCACGGGTTTGTA 

i iiii it i t i t i it lit it ii tit it ttti ii i 

t iiii t t t i t t t it itt ii it i i t it t i t t it t 

GCGCACCGGCAGAATCTG— GTGftACTGGGTGGACAAGGTGGGCGGCGCGGCCTCGGCAGGCATGGTGTTCGA 
840 850 8SO 870 8SO 890 900 

740 750 760 770 780 790 800 

TTGGTCAC— AGCGATG — ATACTG— ACGATTTGAGCTATCTCACCGCCCTGCTGCGCGGATACCTCATCGGT 

I 1 1 I I t 1 1 I 111(11 t t t I II I 1 11 III t t I I II lilt 

I t I t t 1 1 t 1 I I t 1 I t III 1 II I 1 1 1 111 I lit II lit) 

CT — TCACGACCAAAGGGATACTGAACGCT GCCGT-GGAGGG-CGAGCT — GTGGA— GGCTGATCGAC 

910 920 930 940 950 960 



810 820 830 840 850 860 870 

CTAGACCACATCCCGCACAGTGCGATTGGTCTAGAAGATAATGCGAGTGCATCACCGCTCCTGGGCATC — C 



CCGCAGGGG A AGL-.CCC« XGl}.i. 'GTGi^— TTsGRAT Gfa— 7 GGCCGGCCAAGGCCGCCACC-TTCGTCGACA ACCAC 
970 980 990 1O0O 1O10 1020 1030 



880 890 SOO 910 920 930 
GTTCGTGGC-AAACACGGGCTCTCTTG^TCAAGGCGCTCATCGACCAAGGC-TACA-TGAAAC AAATC 

t i lit ii i ill i ii ii i til i i t i i i i i t i i it i i i i 

t i ill it t tit i it t ■ i i t i till i i i i ii i ii till 

GATACAGGCTCCACGCAGGC-CATGTGGCCATTCCCCTC — CGACAAGGTCATGCAGGGCTACGCGTACATC 
1040 1050 1060 1.070 1080 1090 1100 

940 950 3S0 970 930 990 lOOO 

CTCGTTTCGAATGACTGGCTGTTCGGGT T- l TCGAGCTAT GTCACCAACATCATGG ACG-TGATGGATCGCG 

ill ii i il ti t ill t ii ii I I I i til i i i lit i 

ill ii t ii it i ill i ii it i i i i ill i i i iti i 

CTCACCCACCCCGGCATCCCATGCATCTTCTACGA-CCATTTCTTCAAC TGGGGGTTTAAGGA CC 

1UO 1120 1130 1140 1150 1160 1170 

1010 1 020 1 030 I 040 1 050 1 OSO 1 070 

TGAACCCCGPiCGGGAl'GGCCTTCPi1TCC^CT&;AGftGTGPiTCCCPiTTCTACGAG — AGAAGGGCGTCCCA 

t i i i i i i i t t tit i > t tit i * i i i t i titt i t i 

t I I I I I I t I t Til t I I lit I t I t t I I t I t I III 

AGATCGCGGCGCTGGTGGCGATCAGGAAGCGCAACGGCATCAC-GGCGACGAGCGCTCTGAAGATCCTCATG 
1180 1190 1200 1210 1220 1230 1240 

1080 1090 HOO 1110 1120 1130 1140 

CAGGAA— ACGCTGCCAGGCATCACTGTGACTAACCCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATG-A 

it ill i i i i i ■ ii t i it it i i i i iti ii ii i ti i i t i ii i i 

t t iii i i i t i t ii i t t i t t i i i i iti ii it i t l i i ii ii i i 

CACGAAGGAGATGCC-TACGTCGCCGAGA-TA-GACGGCAAGGTGGTG-GTGA-AGA-TCG-GGTCCAGGTA 
1250 12GO 1270 1280 1290 1300 

1150 11SO 1170 U80 1190 1200 1210 

CGCCATCTGGATCCT — TCCACGCAGCGGCCACT AT TCCCCG — TCAAGATACCGAACGATGAAGTC-GCGC 

t i i ii ti t iii it tit i ii ti ill i ii ii i t i i i t i 

ititiii t itt tt itt i till iti i i i > i i t iiiii 

CGACGTCGGGGCGGTGATCCCGGC — CGGGTTCGTGACCTCGGCACACGGCAACG-ACTACGCCGTCTGGGA 
1310 1320 1330 1340 1350 1360 1370 

1220 1230 1240 1250 1260 1270 

ATCGATCGAT AGGCA — TCTTCAA— TGTGATCAGGGCTGC— CACCTCC— AAAGCCGGT—GGCCACCCC 

t i i i I iiii t i t t i i i i i t i i i i i iii i itii ii 

I I It I till II III I I I t I till I lit I llll II 

GAAGAACGGTGCCGCGGCAACACT("iCP.ACGLTAGCTGAAGTCTGCACTGATCCGTCATTCGATCGAGCATGAA 
1380 1390 1400 1410 1420 1430 1440 

1280 1290 1300 1310 1320 X 

TGTC — GATAGT-CTTGAGGGAC GGTAGCGACGACCG TGCTTTTC-GTGAACTGCAG 

111 II III t III I I (III tit I I llll I III I 1 

111 I I I I t I I I I || lilt tltll llll I tit II 

TTTCCTGA— AGTACATGATTCACTTCTGGTATTCACG— CGGATATGATTAACTATGTATACCTGTACCCAAA 
1450 1460 1470 14S0 1430 1500 1510 

AT 
1520 



3. L0W344-FIG1. SEQ 



PDUMER 



Plasrnid pOU135S 



( from S, marcescens) mercurial resi 



LOCUS 

DEFINITION 



ACCESSION 
KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 



PDUMER 2153 hp ds-DNA BCT 15-MAR-19BS 

Plasrnid pDU135S (froii) S. marcescens) mercurial resistance (mer) 
operon encoding organomercur ial lyase (fnerB) » mercury resistance 
protein (merQ)? complete cds, and mercury reductase <merA)» 3' end. 
M 1 5043 

antibiotic resistance? mercuric reductase; mercury resistance? 
organomercurial lyase; transport protein. 

P 1 asm i d pDU i 358 < titu \ 1 1 -ant i b i ot i c res i stance I ncC i ncompat i b i 1 i ty ) 
DNA (-from Serratia vnarcescens ) » clone pHGS. 
Plasrnid pDU135B 

Prokeryota? Bacteria? Plasrnid pDU135S. 
1 (bases 1 to 2153) 

GriHin»K & 5 F'o»U-;v * T. J, » Silver, S, and Misra»T. K. 

Cloning and DN£ se^uonce of the mercuric and organomercurial 



resistance deter* ir^ts of pi asm id pDU1358 
JOURNAL Pr OC„ Ne 1 1 . Acaci Sc i . U S, A. 84 » 311 2-3 U6<1 937 ) 
STANDARD -full sta-f f _review 

Computer-readable sequence for [1] kindly provided by S. Silver, 
25-MAR-1987. 

Locat x on/Gua 1 i f i ers 
< 1. . 359 

/note**" mercuric reductase (merfli AA at 3) ,l 
374. . 1012 

/not e= n organomercur i a 1 1 yase < merB ) " 
1 1 24. - 1 489 

/note» " mercury resistance protein <merD" 
391 a 641 c 680 g 441 t 
Urir&port&dt* 



COMMENT 

FEATURES 
CDS 

CDS 

CDS 

BASE COUNT 
ORIGIN 



Initial Score 
Res i due I dent i ty 
Gaps 



95 Optimized Score - 590 Significance « 10. 17 
50% Matches « 727 Mismatches « 511 

194 Conservative Substitutions = 0 



X lO 20 30 40 50 

CTGCAGCCTGACTC— GGCACCAGTCGCTGCAAG— CAG — AGTC— GTAAGCAATCGCA — AGG 

ii ii t t it t (ii ii i ttii iiii ii it 

it ii i i tt t tit ii i * tii ii it ii it 

TGGCGGTGTCCTTQGTATTGCCeCAQGA(=)GCPiGCC6ACGTTCSTCAGTCCTTCTGTTGCCATGTACATTTCT 

8O0 X 810 820 830 840 850 860 



60 70 SO 90 lOO 110 120 

GGGCAGC-ATGCAAACG — AGAAGGGTTGTGCTCAAGTCTGCGGCCGCAGGAACTCTGCTGGGCGGCCTGGC 

lilt li ill t i t i it i i i t i t i i i l i i iiii ill 

itil t i tit t t i t ii i t t i t t iiit i t t t t i lit 

TTGCATCTGTCCCGACGGCGGAAGACTGGGCCTCCAA6CATCAAGGATTGGAA GGATTGGCGATC— GTC 

870 880 890 900 910 920 930 



130 140 ISO ISO 170 ISO 190 

TGGGTGCGCGACGTGGCTGGATC— GATCGGCACAGGCGATCGGATCAATACGTG— CGCGTCCTATCACAATC 

ill t i i i i i i i ft i tit tilt i i til ii ii tt lit i tit 

tl< tilt tilt til I LI tilt It III I I II II lit 111) 

AGTGT CCACGAGGCT — TTCGGCTTGGGCCAGGAG— TTTAATCGACATCTGTTGCAGACCATGTC— ATC 

940 950 9SO 970 980 990 



200 210 220 230 240 250 260 
TCTGA-AGCG-GGTTTCACACTGACTCACGAGGACATCTGCGGCAGCTCGGCAGGATTC-TTGCGTGC T 

i ttiiiit it iti tt t i titititti i i i t i i tttti t 

i i i i t i i i ii t % i t i t t tit fiitt t i i t t i i tit t t i 

TAGGACACCGTGATCGGATATCGACCCA — ATG — TTCTACGGCACCGGCATCGGATTCGCAGCGCGCGGAT 
1000 1010 1020 1030 1040 1050 1060 

270 280 290 300 310 320 

TGGCCAGAGTTCTTCGGTA6CCGCAAAGC — TCTAGCG — GAAAAG — GCTGTGAGAGGATTGCGCGCCAGA 

ii i i * i i i i tit it t fit t i t t i iiii iiii lit t i 
it i t i i i t i lit ii i lit i t i t i ■ iti ii t i tit ii 

TGAACTCGGCGAAACGGTATATGCATTGCCGTGAACCGACCAAAAGGAGGTGTTCGAJGAACGC— CTACACG 
1070 10SO 1090 1100 15.10 1120 1130 



330 340 350 360 370 380 • 390 

GCG-GCTGGCGTGCGAACGATTGTCGATGTGTCGACTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGA 

■ i i lit it it it t t tit t t t ititit t t t t t i t i ii t t 

11 tilt t t It it t r III t t i rttlll t t t I 1 I i I t I t I 

GTGTCCCGGCTGGCCCTTGA-TGCCG — GGGTGAGCGTGC— ATATCGTGCGCGACTAC CTGCTGCGCG— 

1140 1150 1160 1170 1180 1190 1200 



400 41 0 420 430 440 450 4GO 

GGTTTC-GCGGGCTGCCGACGTTCATATCGTGGCGGCGACCGGCTTGTGGTTCGAC-CCGCCACTT TCG 

• it i i i t i i i I* i i t i it t i iii i tiit tit tii iiii iti it 

i ii i iiii t ■ it i i tt it it tit i iiii lit iii iiii ttt ii 

GATTGCTGCGGCCAGTCG— CCTGCACCACG— GGTGGCTA— CGGCCTGTTCGATGACGCCGC — CTTGCAGCG 

1210 1220 1230 1240 1250 1260 

470 480 430 500 510 520 

A-TGCGAT TGAGGTATG TAGAOGAACTCACACAGTTCTTCCTGCGTGAGATTCAATATGGCATCGA 

I t t l 1 tit! t | t t 1 I | || t It I 111 III I I 

t I I I t IIII I 1 t t t 1 I It t II I tit 111 t t 

ACTGTGCTTCGTGCGGGCCGCCTTCGAbiG — CGGGCATCGGCCT — CGGCG CATTGGCGCGGCTGTGCC 

1270 1280 1290 1300 1310 1320 1330 



530 540 550 560 570 5SO 590 

AGACACCGGAATTAGGGCGGSC-ATT^TC^AeGTCGCGACCA-CAGGCAAGSCGACCCCCTTTC-AQGA — G 
> ■ t t i i i i i i i i iiit i it iii it ii i i iii ii i 

• t i iii i i i i i i it ii i i i lit ii it i i iii ii i 

GGGCGCTGGA — TGCGGCGAftCTGCGATGPiftPiCTGCCGCGCAGCTTGCTGTGCTGCGTCAGTTCGTCGAACG 

1340 1350 .1.3SO 1370 1380 1390 1400 



600 610 620 G3C) B40 650 G60 

TTAGTGTTAA-AGGCGGCCGCCCGGG CCAGCTrGGCCACCGGTGTTCCGGTAACCACTC — ACACGGC 

■ • 1* > i t f i tit t i t iiiiii it it lit til i i tit it 

i * * ' i lilt tit tti ititit ii it iii lit t i iii ii 

CCGGCGCGAAGCGTTGGCCAATCT«aGftftGTGCAG-TTGGCCGCC-ATG — CCG ACCGCGCCGGCACAGC 

1410 1420 1430 1440 1450 1460 



670 680 690 700 71 0 720 
A-GC-AAGTCAGCGCGATGGTGAG CGAG-GCAGGCCGC.CATTTTTGAGTCCGAAGC — TTGAGC 

i i i it i i i i t t i i i i i i i i i i t i i ii i i t i i fiii 

iii t i i i i t i ii i i t i i i i i i i i i ii i i i t t ii t i 

ATGCGGAGAGTTTGCCATGAACAC-XXCCGAGCGCATGCCGGC CGAGACACACAAGCCGTTCACCGGCTA 

1470 1480 1490 1500 1510 1520 1530 



730 740 750 760 770 • 780 790 

CCTCACGGGTTTG-TATTGGT-CACAGCGATGATACTGACGATTTGAGCTATCTCACCGCCCTGCTGCGCGG 

III i i i l t t lit I i> t it it) i i i i l t i i t i it t t i t t ill 

iii iiit t i iii > it t t i lit i i t i t t iiit ti i i i t i iii 

CCTGTGGGGTGCGCTGGCGGTGCTCACC — TGTCCCTGTC— ATTTGCCGATTCTCGCCATTGTGCTGGCCGG 
1540 1550 15G0 1570 1580 1590 1600 



800 810 820 830 840 850 860 

ATACCTCATCGGTCTAGACCACATCCCGCACAGTGCGATTG — GTCTAGAAGATAATGCGAGTGCATCACCG 

i t i t i t i t i t i t t lift it i i i i ii iii i 

i i t i i i i i i t i i i iiii it tiii it iii i 

CACGAAGGCCGGTGCGTTCATCGGACASCACTGGGGTATTGCAGCCCTCACGCTGA-CCG GCTTGTTTG 

1G10 1620 1630 1S40 1650 1660 1670 



870 S80 830 SOO 910 320 

CTCCTGGGCATCCGT— TCGTGGCAAACAOGGGCTCTC TTGATCAAG-GCGCTCATCGACCAAGGCTA 

tllll lilt till) l!llt It I I I I I t I I I I I I I till I 

i i i i I tilt i i i t t t i t i i i : t I i i t t i t i i i t i iiii t 

— TCCTG TCTGTGACGCGGCTGCTGCEGGCCTTCAGAGGTCGATCATGAGCGCTTCCCAGCCAA — TTG 

1680 1690 1700 1710 1720 1730 1740 



930 940 950 9S0 970 980 990 

CATGAAACAAATCCTC — GTTTCGAATGACTGGC— TGTTCGGG-TTTTCGAGCTATGTCACCAAC — ATCAT 

III III II i it I I I i ? i iiii iiiiii iiii i ii I 

iii iii ti i ittiiit i iiit t i i i i i iiiii it i 

AATG— GACAGTGGCGCAACTGGCGCA— GGCGGCCGAGCGCGGGCAGCTTGAGCTGCACTACCAGCCGATTGT 
1750 1760 1770 1780 1790 1800 1810 



1000 1010 1020 1030 1040 1050 lOGO 

GGACGTGATGGATCGCG— TGAACCCCGACG-GGATGGCCTTCATTCCACTGAGAGTGATCCCATTCTACGAG 

it ii ti it it ti ti lit i iii it i tit i it iiii' i ii it 

ti ii t t i i it it ti i t i i iii it i iii i ti iiii t ii it 

CGATTTG— CGCAGTGAGCAGATTGTCGGCGCGGAAGCCCT — GTTGCGCTG— GCGTCATCCGACGCTCGGAC 
1820 3.830 1840 1850 1860 1870 



1070 1080 1090 1 lOO 1110 1120 
AGAAG — GGCGTCCCACAGGAAACGC-T6 — CCAGGCATC ACTG-TGAC— TAACCCGGCGC GGTTC 

i i tit tit t i t i t l iii ill it i ii i t i i i t iii i 

I I III lit I I t I t I III iiiiii i i iiiiii iiii 

TGTTGCCGCCGGGCCAGTTCCTGCCCGTGATCGAATCGTCCGGCCTGATGCCGGAAATCGGCGCATGGGTGC 
1880 1890 1900 1910 1920 1930 1940 1950 



1130 1140 H50 1160 U70 1180 
TGTGTCACCGACTTGC;CGTGCATGACGC:CATCTGGATCCTTCCACGCAGCGGCCACTATTCC CCGT 

>■ i < l t t I I I t I I | III I Till | | III 111 It llll I I 1 1 

II I I I I t IIIIII I lit t I t t I I I III III II I t I t IIII 

TG-GGCGCAG— CCTGCCGTCAAATIBCGCGA— CTGG— CGGGTGCTGGCA— TGGCAACCGTTCCGGCTGGCCGT 
1960 1970 1980 1990 2000 2010 



1190 1200 1210 1220 1230 1240 1250 

CAAGATACCGAACGA— TGAAGTCGCGCP.TCSATCGATAGGCATCTTCAATGTGATCAGGGC— TGCCACCTCC 

• it t ii itt iiii i t i i t i iiii iiii iiiii iii ii 

III I II III I I t 1 I 1 I III llll IIII tllll III II 

CAATGTTTCG— GCGAGCCAAGTGGC-.IGC-nAGAT — TTCGACA AGTGGGTAAAGGGCGTGC — TGGCC 

2O20 2030 P040 2050 2060 2070 



1260 1270 12 SO 1290 1300 1310 1320 

AAAGCCHSGTGCCCACCCCTGTCGATAOr-TCT rGAG-GGACGGTAGCGACGACCGTGCTTTTCGTGAACTGC 

' ill ill tti t I t t ti i i t i r lit ■ I it iii t t i i i i i i 

' ill tit lit it it i t t i I i tti 1 I ii ill ii i i i i i i 

GATGCCGGGTTGCCCGCCGCGTATCT7 0,';AAA7 TGAGiCTGACCGAATCG GTTGCGTTCGGTGATCCGG 

2080 2090 2100 2110 2120 2130 2140 



X 
AG 

i 
i 

CGATCTTC 
2150 



9. L0W344-FIG1. SEQ 

BLYAMY2 Barley (K vulgare) a 1 pha-amy lase 2 mRNA * complete 



LOCUS BLYAMY2 158S bp SS-mRNA PLN 15-JUN-19S8 

DEFINITION Barley (H. vulgare) alpha-amy lase 2 mRNA* complete cds. 
M17128 

a 1 pha-amy 1 ase. 

Barley (K vulgare cv Sundance)? cDNA to mRNA ■ clone E. 
ORGANISM Hordeum vulgare 

Eukaryota ? P 1 arrtae 5 Embryob i onta ? Magno 1 i ophy ta ; L i 1 i ops i da ; 
Gowne 1 x n i dae ? Cyper a 1 es ? Poaceae ? Hordeuim l vulgare. 
1 < bases 1 to 15SS) 

Knox* CPu P n v SonthayanontB. 1 Chandra * G, R. and Muthukr i shnan » S. 
Structure and organization of two divergent a 1 pha-amy 1 ase genes 
from barley 

Plant Mol. Biol, 9. 3-17 < 1987) 
f ul 1 staff _entry 

Draft entry and computer-readable sequence for CI] kindly provided 
by S. Muthukrishnan* 22-SEP- X 987, 
Locat i on/Qua 1 1 f i ers 
1, . 1588 

/note=" alpha-arny lase 2 mRNA (alt. ) " 
2 C r 1588 

/note=" alpha-arny lase 2 mRNA (alt. > " 
3 n . 1588 

/note= n al pha-amy lase 2 mRNA (alt. > " 
4, „ 1588 

/note= lf al pha-amy lase 2 mRNA (alt, > " 
97. . 1413 

/note=" alpha-amy lase 2" 
343 a 484 c 480 g 281 t 
380 bp upstream of SStX site. 



ACCESSION 

KEYWORDS 

SOURCE 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
STANDARD 
COMMENT 

FEATURES 
mRNA 

mRNA 

mRNA 

mRNA 

CDS 

BASE COUNT 
ORIGIN 



Initial Score 
Residue Identity 
Gaps 



107 Optimized Score = 5SO Significance^ 10. 17 
50% Matches - 714 Mismatches « 538 

169 Conservative Substitutions « 0 



X 10 20 30 40 50 

CTGCAGCCTGACTCGGC-ACCAGTCGCTGCAAGCAGAGTCGT — AAGCA— ATCGCAAGGGGG 

■ < ■ < i iii i iii ii i i i i i i i i i i i i i i i i ii ii 

■ • * iii i iii ii i i i i i i i i t * i i i i i i it ii 

GGTTGGCGTCCGGCCACC-AAGTCCTCT TTCAGGGGTTCAACTGGGAGTCGTGGAAGCAGAGCGGCGGGTGG 
160 X 170 1 SO ISO 200 210 220 

60 70 80 SO 1 OO HO 1 20 

— CAGCATGCAAACGAGAAGGGTTGVTGCTCnAGTCl GCGGCCGCAGGAACTC— TGCTGGGCGGCCTGGCTGG 
i i i i i • i i i i i i * i it i it i ii it itt i iii iii i 

* i » t i i itiiiiti ii t it i ti ii iii t iii iii i 

TACAACATGATGATGGGCAARGTCGACGACATCGCCGCCGTCGGAGTCACCCACGTCTGGCTGCCACCGCCG 
230 240 250 2G0 270 2SO 290 



130 140 150 ISO 170 180 190 

GTGCGCGACGTGGCTGGATCGATCRGCACAGGC — GATCGGATCAATACGTGCGCGTCCTATCACAATCTCT 



TC6CACTCCGT — CTCCAPlCtaAAGtfi rTACATGCCTGGTCGG --CTGTACGACATCGACGCGTC-CAAGTACG 
300 310 320 330 340 350 360 



200 210 220 230 240 250 260 

GAAGCGGGTTTCACACTGA-CTCACG-AGGACATCTGCG-GCAGCTCGGCAGGATTCTTGC-GTGCTTGGCC 

i i i i t ■ it ttit t i i i I i i } i i i t i i i i t i t i Iti 

till! I t I I t I I I * t 1 t I 111 II I (till t ■ III t I I t t I 

GCAACGCG GCGGAGCTCAAGTCGCTCATCGGCGCGCTCCACGGCAAG— GGCGTGCAGGCCATCGCC 

370 380 390 400 410 420 

270 280 290 300 31 0 320 
AGAGTTCTTCGGTAGCCGCAAAGCTC7AGCG — GAAAAGGCTGTGAGAGG— AT— TGC— GC GCCAGAGC 

ii it it i i it lit i : i i t i i i i i i i i i i i i i it ii 

ii i i i i ii ii iti tt tiiiti i i i i i t i i i i i i i i 

— GACATCGTCATCAAC — CACCGCTGCGCCEACTACAAGGATAGCCGCGGCATCTACTGCATCTTCGAGGGC 
430 440 450 4G0 470 480 490 

330 340 350 3G0 370 380 390 

GGCTGGCGTGCGAACGATTG — TCGA— TGTGTCGACTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGA 

■ ii i i t i ill i i i i i it i i i i ii> i i i i t t i t i t i ii i i t i 

III I I It III I lit! It I I t I til I I I I I I I I I II I I I I I I 

GGC — ACCTCCG— ACGGCCGCCTCGAC7GGGGC— CCCCACATGATCTGTCGCGACGACACCAAATACTCCGA 
500 510 520 530 540 550 1 560 

400 410 420 430 440 450 460 

GGTTTCGCGGGCTGCCGACGTTCATATCGTGGCGGCGAC — CGGCTTGTGGTTCGAC— CCG— CCACTTTCGA 

ill ill lit lit t i I i i t I til t i I I i ii till it) 

III ill lit lit i ( i i I I I til t till i i i t I I it t 

TGGC — ACCGCAAACCTCGACACC — GGAGCCGACTTCGCCGCCGCGCCC6ACATCGACCAC— CTCAA 

570 5QO 5S0 600 610 620 

470 480 430 500 510 520 
TGCGATTGAGGTATGTAGAG-GAACTCACACAGT TCTTCCTGCGTGAGATTCAATATGGCATCGA 

ill i tit t i i i i t • i i iti lilt it tti i iti i i i i 

lit i t i t i t i ( t i t r i lit ttit it iti i til iiii 

— CGACCG-GGT — CCAGCGCGAGCTCAAGGAGTGGCTCCTCTGGCTCAAGAGCGACCTCGGCTTCGACGCG 
630 640 650 660 670 680 690 

530 540 550 560 570 580 590 

AGACACC— GGAATTAGGGCGGGCATTATC— AAGGTCGCGACCACAG— GCAAGGCGAC— CCCCTTTCAGGAG— 

I i t i I i i i i ttt i : f it i ill ill ttii iiii tit 

lilt I I I I I 113 I It tl lilt III iiii iiii tit 

TGGCGCCTTGACTTCGCTAGGGGCT^CTCGCCEGAGATGGCCAAGGTGTACATCGPiCGGCACPiTCCCCGPiGC 
700 710 720 730 740 750 760 

600 610 620 630 640 650 660 

TTAGTGTTAAAGGCGGCCGCCCGGGCCAGCTTGGCCACCGG-TGTTCCGGTAACCACTCACACG-GCAGCAA 

ii i i t t i i tit ft : i i i t i i i t t i til it i i tit iiii 

ii i iti i i itt it iittititit i tit ii t i iti tit t 

CTCGCCGT GGCCGAGGTGTGGi?,ACAATATGGCCACCGGCGGCGACGGCAAGCCCAACTACGACCAGGAC 

770 780 790 800 810 820 830 

670 S80 G30 700 710 720 730 

G-TCAGCG — CGATGGTGAGCGAGKCAGGCC-GCCATTTTTGAGTCCGAAGCTTGAGCCCTCACGGGTTTGTA 

i it ii ii ii t ft ft t ; t t : i i iti it it t i ii i 

i it ii ti it i it it tii it ii iti ii tiit ii i 

GCGCACCGGCAGAATCTG-GTGAACTGGGTGGACAAGGTGGGCGGCGCGGCCTCGGCAGGCATGGTGTTCGA 
S40 S50 SSO £70 880 890 900 

740 750 760 770 780 790 800 

TTGGTCACAGCGATGATACTGACGATTTGA — GCTATC-TCACCGCCCTGCTGCGCGGATACCTCATCGGTC 
t iti tit it it titi t t i ii ii tii i i t i ii iiii * 

I tit III II It IIII t <! I It II III I III II IIII I 

CT — TCA CEACCAAAGGGA — TTCYGAACGCTGCCGTGGAGGGCGAGCT — GTGGA— GGCTGATCGACC 

910 320 33G 940 950 9GO 970 

810 820 830 840 850 860 870 

TAGACCACATCCCGCACAGT6CGA "! TGG fCT AGAAG ATAATGCGAGTGCATCACCGCTCCTGGGCATC — CG 

i i ttttiii ti tt i tti i Iiii i i t t t i i * i t 

i i i i i t I t t ; t ■ t t i ill i tilt i i l l i i i ii 

CGCAGGGGAAGGCCCCCGGJJGTGA-- !^r.s, f ':lATC,f;r-TGGCCG;GCCAftGGCCGCCACC— TTCGTCGACAACCACG 
980 930 a 000 10 JO 1020 1030 



880 89Q '.^OO 910 920 930 
TTCGTGGC—AAACACGGGCTCTCT T'E-ATCAAGGCGCTCATCGACCAAGGC— TACA— TGAAAC AAATCC 



ATACftlaliU I UL-ttoUUFiUcU--; :-1 1 bi I we-.:: ! ; GCUUTC — CG.^URftGBTCRTGCAGGSCTPiCBCGTftCATCC 
1 040 1 050 1 OftO 1 070 1 080 1 090 1 1 OO 



940 950 SBO 970 980 990 1000 

TCGTTTCGftATGACTG(3CTGTTCF:RGTT-TTCGftGCTATGTCACCftACATCATGGACG-TGATSGATCGCGT 

ii t t I < I t i i i i i i c ( i i lilt ill t i I lit I 

it t I i ii i t i t I i I I i i i till ill i i i iti i 

TCACCCACCCCGGCATCCCATGCATCTTCTACGA-CCATTTCTTCAAC TGGGGGTTTAAGGA CCA 

1110 1120 1130 1140 1150 1160 1170 

1010 1020 1030 1040 1050 l-OSO 1070 

GAACCCCGACGGGATGGCCTTCATTCCACTGAGAGTGATCCCATTCTACGAG AGAAGGGCGTCCCAC 

i i i i i t i t l t til i i i iiii i i i t i t i i i l iii t 

i i i i ■ i tilt iii • t i iitt i i i t t i till tii i 

GATCGCGGCGCTGGTGGCGATCAGGAA6CGCAACGGCATCAC-GGCGACGAGCGCTCTGAAGATCCTCATGC 
1180 1190 1200 1210 1220 1230 1240 

1080 1090 UOO 1110 1120 1130 1140 

AGGAA— ACGCTGCCAGGCATCACTGTGACTAACCCGGCGCGGTTCTGTGTCACCGACTTGCCGTGCATG— AC 

■ t i i i tiii i ii i t ii ii iiii iii t t ■■ t ii i i ii ii t it 

i tit i i i t t i ii i i ti it i t i t iii ii tt t ii t i it ii i ii 

ACGAAGGAGATGCC-TACGTCGCCKAGA-T A— GACGGCAAGGTGGTG— GTGA— AGA— TCG— GGTCCAGGTAC 
1250 1260 1270 1280 1290 1300 

1150 1160 1170 1180 1190 1200 1210 

GCCATCTGGATCCT — TCCACGCAGCGGCCACTATTCCCCG — TCAAGATACCGAACGATGAAGTC— GCGCA 

i i ii ii t tit it i t i i i t i i tit i i i i i t i iii i i 

i t it t t i iii it iii i iitt iii i t t i t i t iii i t 

GACGTCGGGGCGGTGATCCCGGC — CGGGTTCGTGACCTCGGCACACGGCAACG— ACTACGCCGTCTGGGAG 
1310 1320 1330 1340 1350 1360 1370 

1220 1230 1240 1250 1260 1270 

TCGATCGAT AGGCA — TCTTCAA- TGTGATCAGGGCTGC— CACCTCC— AAAGCCGGT— GGCCACCCCT 

II II I I t I I It III I I I I I III! I III I till II I 

I t t I I till It til I I • I I till I III I IIII || | 

AAGAACGGTGCCGCGGCAACACTACAACGGAGCTGAAGTCTGCACTGATCCGTCATTCGATCGAGCATGAAT 
1380 1390 140O 1410 1420 1430 1440 

1280 1290 1300 1310 1320 X 

GTC — GATAGT— CTTGAGGGAC GGTAGCGACGACCG TGCTTTTC— GTGAACTGCAG 

■ I ii iii i ttt ii iii* iti i i iiii i iii ii 

ii ii tit i ttt >t iitt lit i t iiii t iii ti 

TTCCTGA-AGTACATGATTCACTTCTGGTATTCACG-CGGATATGATTAACTATGTATACCTGTACCCAAAA 

1450 14G0 1470 1480 1490 1500 1510 

T 

1520 
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BPECYADE Bordetella pertussis cyaD gene 
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LOCUS 
DEFINITION 



ACCESSION 
KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
STANDARD 
COMMENT 



BPECYADE 2040 bp ds-DNA BCT . 15-MAR-1930 

Bordetella pertussis cyaD gene 3 5 region and cyaE gene; proteins 
necessary -For transport of calmodul in-sensitive adenylate cyclase- 
haerno 1 ys i n ( eye 1 o 1 y s i n ) . 
X14I99 

adenylate cyclase? cya gene 5 cyaD gene? cyaE gene; hemolysin; 
secreted protein; t.oj:ia 
Bordetella pertussis- 
Bordete 1 1 a pert uss i s 

Prokaryota; Bacteria? Graci 1 icutes ; Scotobacter ia; Aerobic rods and 
cocci 9 A lea I igeneceae? Bordete 1 la; pertussis. 
1 (bases 1 to 2040) 

Glaser*P a ■» Sakamoto ?K * Bellalou 9 J„ » Ul lmanriiA. and Danchin>A. 
Secretion of cyelolysim the calmodul in-sensi t ive adenylate cyclase 
- haemolysin bi f unct i onal protein of Bordetella pertussis 
EMBO J. 7, 3S37 -4004 < 19S8) 
simple automatic 

^sources strain -IS.^as. see Y00545 for upstream cya gene; cya 
operon is organized cyaABDE, cyaB (712 aa) is initiated 78 bp 
downstream of cyaA stop; cyaB stop overlaps with cyaD initiation 



FEATURES 
CDS 

CDS 

BASE COUNT 
ORIGIN 



< 44u aa a 

EMBL -features not translated to GenBank features* 

key from to description 

RBS 344 347 put. rRNA-binding site 

Locat i on/Qua 1 i f 1 er s 

< 1. . 353 

/note-"cyaD polypeptide <AA at 3)" 
355. . 1 779 

/note— "cyaE polypeptide (AA 1-474) " 
282 a G73 c 770 g 309 t 



Initial Score « 82 Optimized Score = 530 Significance = 10- 17 

Residue Identity « 51% Mntches » 732 Mismatches « 495 

Gaps » 197 Conservative Substitutions « 0 



X 10 20 30 40 50 SO 

CTGCAGCCTG— ACTCGGCACCAGTC — —GCT~GCAAGCAGAGTCGTAAGCAATCGCAAGG-GGGC-AGCAT 

i ii iii i » i i i t § i t it* i ttti i i i it i ii it iiii it 
i it iii i i i i iiii t lit i i i i i i i t !■ * it ii i i t i ii 

GCGGC-GCCGGCATCCAGQTCCPiGGCTCAGCTCGACPiSCAPiGGACATCGGCTTTGTCPiGGGCGGGCGCGCCA 

X 10 20 30 40 50 60 70 



70 SO SO lOO HO 120 130 

GCAAACGAGAAGGGTTGTGCTCAAGTCTGCGGCCGCAGGAACTCTGCTGGGCGGCCTGGCTGGGTGCGCGAC 

it i it ii tit i it t t t i i t i t v I I I II lit till I 

ii i ii ii iii i it t ii it t i iii i i ii lit i i t t I 

GCTACCGTCAA— GGTCG— GC GCCTACGACTATACGAAGTACGGAACGCTCGAAGGCAAGGTGTTGTAT 

80 90 100 110 120 130 



140 150 ISO 170 180 ISO 200 

GTGGCT — GGAT-CGATCGGCACAGGCGATC6GATCAATACGTGCGCGT — CCTATCACAATCTCTGAAGCG 

tit II till II I I t | t I > t I 111 ! I I II) I t 11 III I t I 

III II till t 1 t I I- I til It III t I ■ ttl I I It III I II 

GTGTCTCCGGATACGGTGGTC GACGACCG — CCAACA— GCACTCGTACCGCGTGACGATCGC GCT 

140 150 160 170 ISO 190 



210 220 230 240 250 2G0 

GGTTTCACACTGAC — TCACGAGGAC ATCTGCGGCAGCTC — GGCAGGATTCTTGCGTGCTTGGCCAG 

ii til iii t t t t t t i t i i i i i i ttti it til t i i I t i iiii i 

II III lit t > I till I I I t I I 1 lilt It ttl I I I I t I llll t 

GG— CGCACCCTGCCCTGGAGGTGGACGGCAAGCCGCGGCTGCTCAAGGAAGGCATGGCG— GTGC— AGGCC— G 
200 210 220 230 240 250 260 



270 280 290 300 310 320 330 

AGTTCTTCGG — TAGC-CGCAAAGCTCTAECG-GAAAAGGCTGTGAGAGGATTGCGCGCCAGAGCGGC — TG 

i t i i i i ft ill ti t it i i t i i i it i i i i t i tt iiii 

i i t iii it tit it iittt iii ■ tiiitiiiii iitt 

A — TATCCGGACCGGCTCGCGGCGCCTCATCGAGTATCTGCTCAGCCCGG— TGGCGCGGCATGCCGGCGAAA 
270 2S0 290 300 310 320 330 



340 350 360 370 380 390 400 

GC — GTGCGAACGATTGTCGATGTGTCGACTTTCGATATCGGTCGCGACGTCAGTTTATTGGCCGAGGTTTC 

it i t i i t t i i tit i t t i t t it i ti iiii iiii it 

II I i I I t t t t 1 i t l I I ( I I II t ti tilt llll 11 

GCCTGGGGGAGC6CTAG — CATG-GCCG— CGGiTGCAAGTCAGGCGACGCGGCCGGGCCCTGGCGTTGGCGCT 
340 350 360 370 380 390 400 

410 420 430 440 450 460 

GCGGGCTGCCGACGTTCATATCGT— GGCGGCG ACCGGCTTGTGGTTCGACCCGCCACTTTCG-ATGCG 

i iitt i iti i i i i t t * t t i i till i t t I t i i ti ti tit 

itttit ii t tiitttttit iiit lit i i i i t i ii tit 

GTGGGCCGGGTTCGCGCTGftGCGTGGGAGGCGGGGTGCGGGCGCGCGAT — GGCCTGGCAACGCCGCCCGCG 
410 420 430 440 450 460 470 



470 480 490 500 510 520 530 

ATTGAGG — TATGTAGAG GAACTCACACAG — TTCTTC-CT6CGTGAGATTCAATATGGCATCGAAGAC 

i tilt it t i t : t : t it i t i t i t i ii i i i i t i i t i i 

t t t i t ti ii i iii t ti t l t ii it ii tit iiii tii 

TTCGAGGGCCAGGCGGCGCCTGCCGTCTCGTGGCCTTGTCCGCCGCCGGCGGATC GGC— TCGACGAC 

480 490 SoO 510 520 530 540 



540 550 560 

ACCGGAATTAGGGCG— GGCA TTATCAAGGTCGCG — 



570 580 590 600 

— ACCACAGGCAAGGCGACCCCCTTTCAGGAGTTAG 



